Mesh: A Racetrack Or City Streets?

Silicon Labs posted a long expected performance analysis of low power mesh networks: ZigBee, Thread and Bluetooth mesh. There is a lot of information that can be found in the report, that BTW looks very professional. But as it is always the case, it should be taken with a grain of salt.

So firstly, I do not question their results. I have seen the test network they have and the setup is impressive. But what has to be made clear, the presented results are reflecting their implementations of mesh and there may be other implementations that deliver different results. So while reading statements like "Mesh network performance including throughput, latency, and large network scalability is presented."you should be aware this really means "Performance of SiLabs implementation of Mesh networks, including throughput, latency, and large network scalability is presented". This first published report brings very interesting insights but what would be interesting to see is how other implementations compared.

Secondly, it may not be entirely clear to casual readers they compare "apples to apples". Sometimes they compare apples to water melons. Also the comparison criteria are not clear. If, for example,  a raw throughput value is considered better, a simple WiFi system would run circles around all three networks being compared. But trying to squeeze 200 or more WiFi nodes would require setting up a number of high capacity access points and the overall budget (dollars and watts) would be orders of magnitude higher. SiLabs states that "this testing is all multicast delivery, the Bluetooth flooding mesh is behaving the same as Zigbee and Thread because all device multicasts generally flood the network with three re-broadcasts from each router". The catch here is the number of relays (or routers) they have for each network is different, and in some test runs it is significantly different. In the initially published summary of the report they noted "All 192 nodes were Bluetooth mesh relays and no relay optimization was used.". On the other hand we know a Thread network supports maximum of 32 routers. Enabling almost 200 routers in a small office like the one used for the tests is a total overkill. Later they added a note saying "For large Bluetooth mesh networks relay optimization can be used to optimize performance.", which is very true. There is another test they run with the number of relays reduced to 40, where Bluetooth performed much better. But 40 is still a large number for such network. Ericsson has a good study on that showing that reducing a number of relays from 49 to 12. Ericsson notes "The best performance among the studied cases is obtained when deploying six relays every 1,000sq m, corresponding to roughly 1.5 percent of the total number of nodes". SiLabs in their tests is still way higher, even when they go down from 200 to 40, they still have 20% nodes as relays, while Ericsson recommends that to be 1.5%. And this is of course one of the most important parameters in large / dense network deployments.

As a side note, one may argue the router selection in Bluetooth mesh is "manual" while the other networks can do this by themselves. The question is what "manual" really means... Certainly not walking around with a ladder and reconfiguring dip-switches in nodes sitting in the ceiling. It is all done by software. Thread elects routers by kicking off a negotiation procedure defined in the protocol. Bluetooth delegates this to an entity called Provisioner, which too is a piece of software (and not an engineer on a ladder...). The approach here is simply different and can be fully automatic too. Bluetooth Provisioners may use additional data they have to select relay nodes. They may have floor plans, they may know which devices are more capable or more reliable to be designated as relays. And - most importantly - they can select multiple nodes to relay messages in the same area, removing any points of failure. This is called multi-path and results is a network that always works and is always healthy, as it has no point of failure. In contrast, when a relay / router fails, the other technologies fall back to a healing procedure and try to elect a new router. This takes time and causes a disruption of service that may exhibit itself as a "popcorning" effect when controlling lights for example. What is even worse, this self-healing procedure may spin out of control when a relay node is at the edge of a radio range and repeatedly falls in and out of the network. In this case the network may be in a continuous self-healing mode and behave very unreliably.

Finally, the tests presented involve essentially a single node that originates the messages and then a bunch of relays. This hardly reflects any real life use case. How many occupancy sensors are there in an office? One? Or hundreds? This test is like measuring a throughout of a car racetrack: the cars never cross routes. Any test that attempts to reflect any real life scenario must use a completely different traffic pattern. Not a racetrack but a maze of intersecting city streets. How about hundreds of cars, each going in a different direction, negotiating intersections and stopping at street lights to avoid collisions? This is the real challenge. Racing on a track is fast and easy. Building a city transportation system that does not grind to a halt or result in many collision accidents is the challenge! The key design that is fundamental to Bluetooth mesh is the packet and the fast radio that carries it. That results in a transportation system that is most effective in crowded mesh environments with randomly originated transmissions.

Comments