Crossing the Mesh Chasms
Particle is discontinuing development of Particle Mesh, our OpenThread-based mesh networking solution, and will no longer be manufacturing the associated Xenon development board.Particle (known previously as Spark) is a very well respected brand in the IoT arena. Investors have recently poured into it $40M in Series C. Thread Group even calls them the most widely used IoT platform. And Thread marketing has managed to elevate their brand to the "best way to connect and control products in the home and buildings".
How come?
I have to admit I have been living in the marketing shadow of Thread since the alliance was announced in 2014. They indeed have done the best marketing campaign, convincing the whole world should opt for "real", "true", "routed", "self-healing", "IP-based" mesh networking solution for low power wireless communications.
Living in that shadow was my own choice. George Gilder has always taught me to "listen to technology". And I have been following the master's advice. Being really thankful for that. Back in 2014, experience was my advantage to start the monumental - against all odds - effort of building Bluetooth mesh - the alternative mesh networking solution for low power wireless communications. Not routed, not self-healing, not IP-based. But one that simply would fly. And so it does.
The experience was real - it came from building a wireless IP-based, routed, self-healing network based on the 802.15.4 radio. That was in 2012, two years before Thread was launched. Long story short: it was a failure. It was a no-brainer to start: of course we wanted the real meshing radio, the 802.15.4. And of course we wanted real, true, routed mesh. And of course the only sensible choice was support for IPv6 on top of it. All stars aligned.
It just did not work.
Particle has been less obnoxious, saying:
we discovered the ways in which it [OpenThread - red] wasn't ready for usand:
802.15.4 mesh networking turns out to not be the right solution for most of the customers who wanted to use itTo be honest, I'd love to learn the concrete details. But having said that, our own conclusion after trying to bring a 802.15.4 mesh to a product level that would be acceptable for our customers, would be the same.
Despite many desperate efforts, the 802.15.4 - based mesh network we had, kept on collapsing on any meaningful traffic. It seemed to work when transporting one message at a time. But as soon as we started putting a bit more stress on it, increasing the number of messages injected into the network, it was saturating and collapsing, ceasing all communications. And we realized that the routed (read: single-path), self-healing (read: requiring link-layer acknowledgements) and IP-based (read: heavy payloads) network on top of 802.15.4 (read: slow and interference-prone) radio was a flawed architecture. We realized it would never fly. It was like an airliner made of iron, with additional chains attached to the wheels. Yes, 8 years ago we were where Particle is today. Forced to drop the concept and write off the investment.
So we listened to the technology.
The technology told us that wireless was a shared medium and that interference defined the rules there. That the way to avoid interference (collisions) was to make the radio messages ultra compact. The technology also told us the nice-looking "routed" wireless mesh could never exist due to lack of radio isolation between the nodes. All those technology challenges have been considered great opportunities for our brand new wireless-first design, which materialized as Bluetooth Mesh 1.0 in 2017.
The technology also told us many things that sounded counter-intuitive. Or unacceptable. Like that reliability does not exist in wireless . Messages are lost in transit due to interference. There is a certain chance a message will go through, certain probability. That probability goes up if you send more copies of a message. Say your packet loss is 20%. So you have 80% chance the message will get through on a single attempt. But on two attempts the chance goes up to 96% and then to 99.2% on three attempts. So instead of just sending a message, expecting an acknowledgement, and resending the message (3 radio messages), why not send the original message 3 times instead? Especially when your target is a group of nodes. At some point, the statistics deliver the reliability you are targeting: 3-nines or 7-nines or whatever. And even sending 7 copies of a message in a row to a small group generates way less traffic than requesting an acknowledgement from each group member.
The technology also told us about one very obvious, but very often overlooked behavior of radio transceivers: they are half-duplex. That means they are deaf (cannot receive anything) when they talk (send messages). This happens all the time to a relay / router node, as it keeps on receiving messages and then retransmits them. The more traffic it has to transmit, the more deaf it is, resulting in losing inbound messages. To other mesh nodes such relay starts looking as unavailable. The self-healing approach says a new relay/router should be established by the network, if the current one is unavailable. And here is the catch: in a busy network each relay/router node is both available and unavailable. At the same time. Like Schrödinger's quantum cat. Its available / not available ratio can be 50% all the time. If this happens, and this does happen, especially in a busy network, the network collapses. It becomes entirely busy with the self - healing process, continuously electing new relays. Of course relay elections generate radio traffic, so the network becomes entirely busy with itself and stops transporting any other messages. Self-healing is a lie. It does not work in real busy networks, period.
The final solution to the self-healing problem is: have a network that does not need that. A network that is always healthy. How? Simple: drop the concept of the next hop destination and allow for multiple paths to exist at the same time, putting, again, statistics to work. Instead of relying on nodes that are elected as relays/routers for a single-lane path across the network, allow establishing multiple paths that exist in parallel. Multiple paths add the required healing effect proactively. Instead of acting reactively - awaiting acknowledgements, marking a relay as unavailable, starting the healing process - why not have two or three or more alternate paths ready upfront? And send multiple copies of a message in parallel, each along a separate redundant path? The chances that at least one copy of the message reaches the destination can easily reach the level at which even a human would consider it reliable enough to be used in life-safety systems, such as vehicle brakes.
This is our story, at Silvair. The story of counter-intuitive pursuit of the ultimate low power wireless mesh technology, known as Bluetooth mesh. Judging by what happened to Particle, we were lucky landing in their position back in 2012. And then we were extremely consistent in applying the lessons learned to the architectural design of the Bluetooth standard. And to be fair - we gave early warnings when we published the first edition of the "Tale of Five Protocols" back in 2014. It was a bet. A bet because we could not know if it would fly until it did. But now the level of confidence is full. Bluetooth mesh really took off the ground and we have been long with wheels up now.
To be honest, this is one half of the story. The other half is addressing what Zach has mentioned as the other challenges in his blog:
building a proper network management solution that would meet our expectations for simplicity while being powerful enough to support large-scale deployments would be a massive undertaking, deserving of an entire company on its ownThat became obvious to us many years ago. Designing the Bluetooth mesh specifications we were equally busy building the premiere network management platform for commissioning, maintenance, and as a foundation for value added services. That award-winning platform is an inherent piece of the overall puzzle. And a story of its own. In a nutshell it enables exactly that: simplicity while being powerful enough to support large-scale deployments. The deployments that are happening now on a global scale.
Comments
Post a Comment