invertedAAlogoThis is not a big story, but I find it interesting. Last week American Airlines had its reservations computer system — called SABRE — go offline for most of a day leading to the cancellation of more than 700 flights. Details are still sketchy (here’s American’s video apology) but this is beginning to look like a classic example of a system that became too integrated and a company that was too dependent on a single technology.

To be clear, according to American the SABRE system did not itself fail, what failed was the airline’s access to its own system — a networking problem. And for further clarification, American no longer owns SABRE, which was spun off several years ago as Sabre Holdings, but the airline is still the system’s largest customer.

It’s interesting that Sabre Holdings has yet to say anything about this incident.

American built the first computerized airline reservation system back in the 1950s. It was so far ahead of its time that the airline not only had to write the software, they built the hardware it ran on, too. Over the years competing systems were developed at other airlines but some of those — TWA and United included — were splintered versions of SABRE. American has modernized and extended the same code base for over 50 years, which is long even by mainframe standards.

Today SABRE is probably the most intricate and complex system of its type on earth and Sabre Holdings sells SABRE technology to other industries like railroads and trucking companies. In many ways it is hard to dissociate the airline and the computer system, and that seems to be the problem they had last week.

The American SABRE system includes both a passenger reservation system and a flight operations system.  Last week the passenger reservation system became inaccessible because of a networking issue.  In addition to reservations, passenger check-in, and baggage tracking, the system also passes weight and location information over to the flight operations system which calculates flap settings and V speeds (target takeoff speeds based on aircraft weight and local weather) for each departure runway and flight combination. The lack of either system will cause flight delays or cancellations, not just because the calculations have to be done by hand, but because the company had become totally dependent on SABRE for running its business.

Without SABRE American literally didn’t know where its airplanes were.

Here’s an example. SABRE has backup computer systems, but all systems are dependent on a microswitch on the nose gear of every American airliner to tell them when the plane has left the ground. That microswitch is the dreaded single point of failure. And while it may not be that switch that failed in this instance, it is still a second order failure because if you can’t communicate with the microswitch it may as well be busted.

That’s what happens with such inbred systems that no one person fully understands. But it’s easy to get complacent and American was used to having its systems up and running 24/7. The last significant computer outage at American, in fact, happened back in the 1980s.

That one was caused by a squirrel.