I am a pilot. I am not a software engineer or software writer. That said, I use software like everyone else in just about everything I use every day. As pilots, we are all to familiar with the problems on the Boeing 737 MAX. We are being told that faulty software is the cause. Yes, there were or could have been problems with the pilot training, but Boeing is re-writing the software and when complete, the problem will go away and the aircraft will be safe. Or will it?
Airbus is having an issue as we speak with the A350. In a mandatory airworthiness directive (AD) reissued in July 2019, EASA urged operators to turn their A350s off and on again to prevent “partial or total loss of some avionics systems or functions.”
This must be done at exactly 149 hours.
In 2015, the Boeing 787 suffered a similar yet different problem. A memory overflow bug was discovered that caused the generators to shut themselves down after 248 days of continual power-on operation.
These are just a couple of the latest incidents that are occurring on our newest generation of aircraft. Why are we having these computer-related problems?
I have been doing some research and believe me, it is hard—very hard—to sift through the BS on this subject.
In the “old days,” testing was straightforward. As an example, many of us have seen the video of the wing bending on the original Boeing 747. Straightforward. You bend something until it breaks, do it again and again, and you then have a pretty good idea of when it will break. If it is well within the limits you set, you are good to go.
You cannot bend or break software. So, what do you do? You put it through testing that some people consider industry standard and others don’t. Here is what I found for a description on testing.
The definition of software testing, according to the ANSI/IEEE 1059 standard is, “A process of analyzing a software item to detect the differences between existing and required conditions (i.e., defects) and to evaluate the features of the software item.”
Makes sense to me. Let me give you an example. I have a cell phone. It is full of software. How many times must I turn it on and off before it will fail? Will it always fail the same way? Will my model of phone fail for me and you at the same time? We cannot answer these questions. With the bending wing, we can, and we have a very good idea that, at a point in the bending process, the wing will fail.
Software is not a wing. It is a code written for a unit it will help operate. Specifically, source code is made up of the numerous lines of instructions that software programmers write to create all software applications. Once the source code is written, it is compiled into a machine-readable program which is installed on a computer as an application.
So how is it tested so that we are as sure as possible that it will not fail, or we know exactly when it will fail and we can replace it before that time?
There is manual testing, automation testing, static testing and dynamic testing. Then there are approaches to testing like white box, black box and gray box. Finally, there testing levels. They are unit testing, integration testing, system testing, and acceptance testing. This is all very impressive, but it still doesn’t tell me how long that unit will run on the software and why exactly it will fail, like the wing will fail.
So why does software fail? Here is some of what I found on that subject.
- Lack of user participation
- Changing requirements
- Unrealistic or unarticulated project goals
- Inaccurate estimates of needed resources
- Badly defined system requirements
- Poor reporting of the project’s status
- Lack of resources
- Unmanaged risks
This is all very reassuring. I mentioned that I am a pilot and not a programmer. Given this, how do I know that the software testing that goes into an aircraft is more complete than that which goes into a cell phone? I have no way of knowing that. If a hydraulic pump on an aircraft engine fails, it is sent out and bench tested. A fault is found, and a directive is issued so that all other pumps can be inspected and fixed. All the operators who use those pumps are notified. It is a simple and time-tested procedure.
Does a software failure on one aircraft necessarily mean that item will fail on all aircraft? With the hydraulic pump we know that things such as temperature, lubrication, vibration, and other factors can cause the failure. How so with software? We are in a highly regulated business. Software and the people who write it are not. They are, for the most part, self-regulated. Once you are a certified as a software engineer, you can write for anyone who will hire you.
Just look at how often the “check engine” light illuminates on a car or truck. That is a computer program. From what I can find out, there is not much more that goes into the software for an aircraft as there is that vehicle.
How often are we faced with software failures in aviation? I suggest that we do not have a clue. Unlike the pump, a software problem can go unreported. With a software failure, maintenance usually just does a reset and the problem goes away and then may or may not reappear. I for one do not believe that we keep a complete record of these small failures. I have experienced it firsthand and I saw the reaction of both my company and the manufacturer.
In one case I was flying an Airbus. On descent, I was about to level at 10,000 feet. I was hand flying the aircraft with the auto-throttles off. I moved the throttles forward and got no response. At that point, I was cleared to 9000 ft. I told the first officer to check for a problem quickly. Everything was in the right place. Everything.
Nothing I could do restored my control of engine power. I was cleared to 7000 feet and I knew that I would be staying there for some time. It was night and the weather was 400 overcast and I was nowhere near an airport I could glide to. At 7000 ft. I purposely let the speed decay to what is known on the Airbus as Alpha Floor. This is a computer program that Airbus installed so that the aircraft could not be stalled (not at all like Boeing’s MCAS). When Alpha Floor is reached, the aircraft is programed to go into TOGA thrust (take off and go-around). My hope was that if a computer glitch got me into this predicament, then the computer might just get me out of it. It did. The aircraft responded as it was supposed to, and everything was restored, and we landed without incident.
On landing I pulled the cockpit voice recorder and flight data recorder tapes and maintenance removed them. I did all that was required for such an incident and went to the hotel.
I was a commuter and when I returned home the next day, my wife was on the phone and told me it was for me. It was my company and Airbus in Toulouse, France. There was a great deal of concern over the incident—as there should have been. In the end, my company sent all the tapes to Airbus to investigate, they did not report it to the regulatory authorities and Airbus did a software change to all their aircraft using that system.
In 1984, I was flying between Dubai and Male, Maldives. I was flying a DC-8 and we had an Omega navigation system on board. It had just been installed and I had never used one before. At exactly the second that the sun broke the horizon that morning, the aircraft started to turn off track. The Omega was driving the autopilot at the time. If I had allowed it to continue, it would have kept turning. I found out later that there was shielding missing in the computer unit that caused this anomaly.
On the Boeing 747-200, some of the autopilots would suddenly and dramatically go into roll mode. I had this happen while flying to Paris one night. It happened to a Taiwanese flight flying from New York to Taipei. The aircraft rolled over onto its back at FL 390 and the crew did not regain control until around FL 170.
And one more. On my second to last flight before retirement, I was flying the polar route to Hong Kong. At about 50 miles past the North Pole, we began to get master caution warnings. “Smoke in the Lavatory,” “Cargo Compartment Smoke,” and the very worst one on an Airbus: “Electrical Smoke or Fire.” These warnings came every seven minutes and before we could react, the warning disappeared from the screen. This went on for the rest of the flight. I contacted my company via sat phone, and they got Airbus on the line. Airbus told me, yes, there is a problem with the warning computer, and they are aware of it. There is a fix coming out in two weeks. Software fix.
These are just the problems I have had. Multiply this by the number of “electric” aircraft we have in the air and the number of problems will be staggering.
Who can you trust, what can you trust?
Ask yourself this. Why do we need all this fly-by-wire and many other computer systems? Let’s face it, manual flight control systems for a pilot are much more intuitive and user-friendly. You can feel an aircraft when flying manual controls but there is no feel in fly by wire. We have all this computer equipment because it is lighter and saves money. Weight is money and airlines love it. Aircraft are designed by engineers and technicians and not by pilots. Yes, they do ask our opinion, but do they really incorporate it into the final design? Very little. The bottom line drives all of this and nothing more.
As pilots we should know what kind of testing is being done on software, who does it and what the expectations of it are. Again, it is up to the regulators to do their job, but I fear as I see what is happening with the MAX, they will allow economics to be the big winner.
I am about to leave flying for good. Many of you on the other hand are just starting. It behooves you to look deeply into this problem as it will probably affect you for the rest of your career.