Digital Fragility

Everything is digital and therefore software-driven today. And that software is getting super complex, with tons of layers and dependencies. This complexity, statistically, leads to many failures which in the end are unavoidable. No matter how thorough the testing is, it is impossible to cover and catch every possible scenario.

Even simple stuff like alarm clocks sometimes fail to go off. Something unthinkable in the old days of mechanical alarm clocks. And there are times when a working alarm clock is critical. Such that when you have an intercontinental flight to catch early in the morning. Just recently Apple has confirmed the iPhone alarms may not go off due to a software bug. That is why even being a digital gadget geek, I have a rule to set up an analog alarm clock just in that case - an early morning flight.

To regular people this is something close to unthinkable - the most premium mobile phone, with millions of users and billions of R&D and QA budgets behind - can still fail with a task as simple as waking you up in the morning.

The consequences can be even more scary - when a digital device such as an insulin pump fails as a result of a bug in the companion app. The app crashes, the OS restarts it, it tries to connect to the pump and crashes again. Repeat and repeat until the pump battery is drained. This can lead to coma or death.

This insulin pump issue also highlights the fragility of the architecture when a controller (the app in this case) is external to the actuator (the pump). The proper design should be a fully autonomous pump which knows on its own when to inject insulin. The phone app should act only to set up the conditions activating the pump and monitoring its actions. 

The airline industry has been the greatest example of designing robust and resilient systems. It is a long term iterative process of continuous learning from almost every aircraft accident. Properly designed aircraft today have no single points of failure. Engines, for example, are fully autonomous thanks to Full Authority Digital Engine Controllers (FADECs) - pilots' inputs are just engine setpoints and all the control loops are handled autonomously by each engine.

In my field, this distributed autonomy has also been at the root of resilience of Bluetooth NLC lighting systems. Among other things, every node runs its own internal clock and executes scheduled events autonomously. The lights will come up even if communications are broken or the network time keeper device goes offline. I'll be covering this and more at this week's IES & DOE Research Symposium, discussing resilient lighting control architectures. Looking forward to a great discussion!

Comments

  1. Good points! I have recently missed several meetings because I did not hear a phone (Android) notification. I think when my BT headset is on, the notification is going to the BT headset instead of the phone speaker, and then I don't hear it. Always something ...

    ReplyDelete
  2. Exactly! Management of audio output is quite complex and even harder to clearly communicate to users / meet their intentions. Including watching porn when your iPad is connected to the neighbor's Bluetooth speaker... Would never happen with a 3.5mm audio jack...

    ReplyDelete

Post a Comment