The Weakest Link - Single Point of Failure ᐅ Westermo

Support / Industrial Networking Blog / The Weakest Link - Single Point of Failure

The Weakest Link 🔗 Single Point of Failure

Data communications networks are the central nervous system for nearly every critical industrial network, from the transportation of goods and passengers by rail, road, air, and sea, to the supply of electricity, oil and gas, and fresh drinking water. As a result, industrial networks are relied on by every sector, and the backbone of our society's infrastructure. The reliability, resilience, and robustness of these critical networks are paramount and network failure is not an option.

In this article, we describe one of the key foundation for designing and implementing a reliable and resilient data communications network that can, for the most part, be self-sufficient and ensure the continuous availability of data across any mission-critical system.

That key foundation is preventing single point of failures.

What Is a Single Point of Failure?

Almost any situation can be affected by a single point of failure. In a data communications network, a single point of failure is any component that, if it fails, can stop the entire system from working and communicating. It could be hardware, such as a router or a switch, a power supply or a cable. It can also be the total reliance on one connection or service, such as your internet service provider, your broadband connection, or your cellular provider or 4G connection.

The single point of failure might be the person who designs, implements, and maintains the network. Everyone is capable of making mistakes, therefore having a second pair of eyes to check things is always a good idea.

Visualization of a single point of failure.

Network Design - Build Network Resilience from the Ground Up

Network resilience should be the foundation upon which you build your mission critical network. When designing your mission critical network, make resilience and failover a top priority from the very beginning. A network that is built with resilience from the ground up is much easier to maintain than one that is designed without it, since adding resilience to an existing network can be time-consuming and costly. Making any changes to an existing network because of a failure can cause things to be rushed. Without proper testing, alterations to the network can have a knock-on effect with other elements of your network resulting in a situation where you are continuously applying patches to mend broken parts of the network.

Things can go wrong, it is just a fact of life. However, the best way to avoid problems is to prevent them happening in the first place. Designing, testing, and looking for those single points of failure from the beginning, testing as you go, will lay strong foundations for everything else.

Hardware Single Point of Failure

What is a hardware single point of failure?

The hardware single point of failure is a situation where, if one physical piece of equipment fails (e.g., a router or switch), and there is no failover to another device, the rest of the network or communication will be disrupted.

Visualization of a hardware single point of failure.

Preventing a hardware single point of failure

When you have a single communications device with multiple channels, you might think you have full resilience. Consider the diagram below. Router A, which is solely responsible for communicating with the internet, does have a DSL link with a failover to 4G. There is some resilience in the fact that in case of a broadband failure, 4G will take over. However, if router A fails, all communications to the internet will be lost.

The diagram below shows the same network, but instead of having one router responsible for failover, there are two. Router A is a DSL router, router B is a 4G router, and together they ensure that there is a fail-over in case the broadband link goes down. This network can also ensure that communication to internet continues even if either router goes down.

Redundant Power Supply
There are some switches that allow two power supplies to be connected simultaneously. The secondary power supply will ensure the device remains powered even if the primary power supply fails. Generators or battery backups are two examples of backup power sources. (Check the power rating of the device for a suitable power supply).

Software or Configuration Single Point of Failure

What is a software or configuration single point of failure?

Starting with the right physical network devices is key to avoiding a single point of failure at the first level. The next step is to ensure that the network is configured correctly, to be able to monitor it and automatically take an action such as providing an alternate path in case something goes wrong with a device or a network provider.

Preventing a software single point of failure
Before purchasing network devices, learn which redundancy protocols are appropriate for your network environment. If keeping existing devices in the network is the intention, then find out what failover protocols are supported so that compatible devices can be purchased. It is often beneficial to stay with the same manufacturer so you can enjoy compatibility and interoperability. Having the ability to get technical support from the same place is another advantage.

Service Single Point of Failure
It might not occur to you, but if you are only using one network service provider, you have a single point of failure. Consider diversifying your network provider. For example, having the ability to fail over from your broadband provider to a cellular provider eliminates that single point of failure. Similarly, if your 4G router supports two SIM cards, you can diversify your service provider by using SIM cards from different network providers. Roaming SIM cards are also an option if your router supports them.

The final single point of failure – people
Could you be the weakest link?

Having more than one person check on a network is always a good idea. Another pair of eyes is often able to spot areas of vulnerability that the first was not able to spot. A system should be in place to deal with potential outages and ensure someone is always available to check notifications and alarms. Most importantly, know your network.

Conclusion

A strong network starts with having the right mindset. Give network resilience the highest priority. If resilience is not prioritized at this point or costs are cut, it could end up costing more in the long run due to costly site visits, lack of information, or lack of service.
Know your network objectives. Knowing the purpose of each part of the network and the reliance different parts of the network have on each other, helps you determine where to build resilience.
Eliminate all single points of failure. Identify potential risks posed by conducting a single point of failure risk assessment across three main areas: hardware, software/services, and people. Create a checklist detailing the general areas for assessment.
Avoid relying on only one person to identify potential risks. The rule of thumb is: two pairs of eyes are better than one.

This article was written by Julian Megson, Technical Expert at Westermo in the UK. The recommendations in this document are based on deep knowledge of industrial data communication technologies and hands-on experience from supporting many different customer applications.