DALI+ over Thread: Capacity

The post on Bluetooth NLC vs DALI+: Capacity and Performance has generated very valuable feedback. I promised to shed some more light on my network capacity calculations, so here they are.

The key assumptions are:

- The Thread network operates at 2.4GHz using IEEE 802.15.4 data links at 250kbps;
- Transmitting nodes use Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) method to manage the shared communication channel;
- Each unicast transmission is acknowledged
- There are two hops (say: sensor → router and router → lighting controller)
- The minimum data frame at PHY is 50 octets (+6: preamble + SFD), so ~1.8ms
- The ACK frame (including the SIFS time) is ~0.5ms
- The average CSMA/CA overhead is ~1.4ms

That gives us 3.7ms per hop total. So with two hops we have the channel occupancy of 7.5ms. At full 100% channel occupancy that would give us capacity of 135 end-to-end messages per second (one way).

But... 100% would simply kill the network.

Thread is very fragile. We learned this the hard way when experimenting with Thread back in 2012-2013. Yes, that was 2 years before Thread was officially launched. These experiments really gave birth to what is now Bluetooth mesh / Bluetooth NLC (the work started in 2014).

The issue with Thread is that you need to leave it a (lot of) room to breathe. That means not to saturate the radio channel. Thread needs headroom for retransmissions, routing / network control (MLE, children supervision, etc.) and occasional message bursts. And then there is the traffic going in the opposite direction. Even if not that frequent as Allan pointed out in his comment. Many great deep-dive discussions on that can be found on the internet, e.g., this one: Not stable Thread network topology in a big > 100 nodes network.

How much room then? A 50% channel load would be where my red scale starts. That means about 65 messages per second. The yellow scale would probably be somewhere between 30%-50% load, so 40 messages per second. Below 40 messages/second, with very short packets and stable radio conditions, you should be safe.

Now my 20 nodes network capacity number (questioned by the DALI+ proponents) came from the assumption of 1 message per second per node. In other words, assuming each node, on average, generates 1 message per second: that could be an application message (a sensor detecting occupancy or a driver reporting energy consumption), or a network maintenance message. Lowering the user experience bar (e.g., by limiting the sensor reporting rate) can of course increase the node count. But that must be carefully implemented and obeyed.

PS. There have been some more valuable comments on LinkedIn.

Comments