When Software Goes Down
Today's aircraft are considered very safe. The key for the safety record is redundancy of all critical systems. Everything is doubled or tripled or quadrupled. From the probability theory perspective, multiplying the critical components vastly reduces the chance of a total failure, as the individual equipment unit failures are considered to be independent. That is mostly true for legacy / mechanical components. An engine blade failure in one engine does not cause the other engine to fail. Or a chance of two blades (one in each engine) failing at the same time is practically zero.
But it turns out when software is involved, equipment failures can be fully synchronous. Which I believe is what happened in this accident.
In the Boeing 787 Dreamliner, the fuel pumps are electric, powered from the 235V AC power bus. That bus, for redundancy, is supplied by four generators. Four, for redundancy.
Now there is the FAA-2015-0936 Airworthiness Directive Amendment which describes a software bug which looks like a counter overflow. And as the consequence:
a Model 787 airplane that has been powered continuously for 248 days can lose all alternating current (AC) electrical power due to the generator control units (GCUs) simultaneously going into failsafe mode
The recommendation at that time (2015) was to periodically power cycle the entire aircraft periodically. That is what you sometimes do to your computer or a phone. Now we are supposed to do that to a commercial aircraft.
This particular bug has already been fixed. It is not clear if the Air India aircraft had the update installed. Or if it did or did not have a timely power cycle applied. That does not matter. As if one such bug exists, there can be more that simply have not surfaced.
What matters is the failed architecture. The software - driven generator control units (GCUs) are in fact a single point of failure. Because they run the same software which boots up at exactly same time. So any overflow, memory leak or similar condition will hit at exactly the same time bringing all four down at the same time.
I cannot imagine how this architecture was approved as airworthy in the first place. Also how 787s were not grounded upon discovery of the FAA-2015-0936 bug in 2015. Clearly Boeing and the FAA have put an aircraft with a single point of failure in the air, allowing it to carry passengers. And - as the MCAS case in the 737 MAX series has taught us - it was not the first time for the company to do so.
[Update]: I was pointed to the https://kb.skyhightex.com/knowledge-base/how-to-master-b787-fuel-system/ which says:
If All Pumps Fail, Each Engine Can Suction Feed From Its Respective Fuel Main Tank.
So it seems the sequence was: loss of engines -> loss of AC power, not the other way around. Still there must be something that caused both engines to lose thrust simultaneously. It can also be that the suction of fuel by the engines was not sufficient to keep enough thrust to keep the aircraft flying.
Comments
Post a Comment