Thread: When Healing is Breaking

Thread - the wireless mesh networking protocol - is self-healing. This has been one of the highlights of Thread since the beginning. And the concept works.

Sometimes. And the other times it leads to the network breaking up. 

So let's see what it is all about and what is the problem.

The key difference between Thread (and similar mesh networking concepts) and Bluetooth Mesh is that Thread is really aware of the network topology. There is the concept of the network in the first place. The network has a Leader and Routers. And is formed. Nodes join the network. It all means each of these actions can be reversed. The network may lose its Leader. And Routers. And nodes may drop off. So the network may transition from a "healthy" state into an "unhealthy" state and then try to go back to the healthy state through what is know as healing.

Everything nice on paper so far. Now let's dig deeper.

It all happens automatically. How? The nodes in a Thread network keep "pinging" each other periodically. If the pings are not successful, the healing process starts. And it may include electing a new Leader or new Routers. So far so good. But the unsuccessful pings may not really indicate the leader or a Router has been lost. They may simply be a result of network congestion, which happens fairly often, as these networks (802.15.4 - based as is Thread in particular) saturate whenever a sending node (or several such nodes) attempt to sent a bit more traffic. And that application traffic simply blocks the "network maintenance" traffic which leads the network to think is is broken and triggers the healing process. Which in turn may lead to nodes losing their routers or more than one Leader being elected. Which in turn results in the network being partitioned into two (or more) independent networks and nodes losing connections with some other nodes permanently.

In some cases this phenomena is conceptually similar to what 2022 Formula-1 cars have been exhibiting as "porpoising". Nodes lose and gain and lose connectivity again. The bad thing is they may lose it permanently. Or even not restore the connectivity at all after a power cycle, which is highlighted in the [Modeling and Performance Evaluation of the Thread Protocol, by Ashwini Shamsundar, with the help of Dr.ir.Esko Dijk, Luca Zappaterra, Philips Lighting Research.].

The bottom line is - the solutions which are nice from the marketing perspective and look good on paper (or in PowerPoint slides) often do not take into account real world behaviors or adverse effects they may lead into. How does Bluetooth Mesh avoid this problem? Well, In Bluetooth Mesh there is no network maintenance traffic. It is not necessary as there is no single point of failure (such as a Router the node is attached to or the network Leader). So there is no danger of this maintenance traffic to suffer and lead to catastrophic consequences. Bluetooth Mesh can easily survive even super high traffic (or interference) bursts and just keeps working.

Comments