Priya Narasimhan, and pictures of some of her research projects (football engineering, YinzCam, Trinetra)

Disclaimer: This list is not complete or up-to-date. I have long been collecting these kinds of anecdotes to motivate our own fault-tolerance research, and wanted to make them available to others. Please email me if you know of any that I am missing.

Causes of Failure in Web Applications Soila M. Pertet and Priya Narasimhan, Technical Report PDL-CMU-05-109, Carnegie Mellon University, December 2005

Microsoft lapse causes outages in Azure service, San Francisco Chronicle, February 22, 2013.
Blackberry network down worldwide, Softpedia, October 10, 2011
Major Amazon EC2 Outage in US-East-1 Region (affects Reddit, Quora, HootSuite, more), April 11, 2011
Power Outage for Amazon Data Center, December 10, 2009
Lightning Strike Triggers Amazon EC2 Outage, June 2009
Outage for Amazon Web Services (packet-loss problems), July 2009
Explosion at the Planet causes outage, June 1, 2008
Amazon EC2 Outage Wipes Out Data, October 2007
Major Outage for Amazon S3 and EC2, February 2008

NASA loses communication with International Space Station during software update, The Verge, February 19, 2013
Online banking upgrade contributed to Bank of America outage, American Banker, October 6, 2011
Google suffers first Gmail outage of 2011, eWeek, February 28, 2011
.Mac migration to MobileMe hits some roadblocks (Apple's upgrade of .Mac to MobileMe service), July 10, 2008
Failed Windows XP upgrade downs 60,000 UK Goverment PCs, November 2004
Vonage voicemail vanishes during site upgrade, July 2005
MCI upgrade glitch disrupts ATM service, August 1999
Software upgrade glitch grounds Los Angeles airport, October 2000
EDS: IT upgrade caused software glitch at UK agency, December 2004
The truth: What caused the AT&T frame relay network outage, February 2001
Upgrade glitch downs AT&T Wireless' CRM system, November 2001
eBay says system upgrade tied to recent outages, October 2000

Schwab Outage: IT Wake-Up Call, InternetWeek, February 26, 1999
Stock Market Outages Highlight Software Availability Issues, The Payne Report, July/August 2001
Outages at eBay:
    -- eBay Retrenches, InternetWeek, June 17, 1999
    -- eBay Servers Go Down -- Again, InternetWeek, June 30, 1999
    -- eBay Crashes Again, InternetWeek, August 9, 1999
    -- eBay goes back online after prolonged outages, CNET, January 4, 2001
Outages at E-Trade:
    -- E-Trade Suffers Outage, InternetWeek, February 3, 1999
    -- Another Outage at E-Trade, InternetWeek, Feburary 4, 1999
    -- E-Trade Explains Cause of Last Week's Outage, InternetWeek, February 10, 1999>
    -- E-Trade CIO Discusses the Outages, InternetWeek, February 10, 1999
Points of Reference: e-Commerce Failures, Online Brokerage Industry Report, 1999

Humming along with technology, until it's not, Washington Post, November 5, 2009 (computer failure disrupting 750 traffic lights)
Human error called culprit in 3 rocket launch failures, Florida Today Space Online, June 16, 1999
Ariane 5 Flight 501 Failure, Inquiry Board Report, Prof. J. L. Lions, July 1996
Electric Blackout of August 2003 in Northeast US and Canada:
    -- Reports from the North Americal Electric Reliability Council
    -- Reports from the US Department of Energy
Therac-25 radiation incidents:
    -- Investigation of the Therac-25 accidents, Leveson & Turner, July 1993
    -- Therac-25 case materials,
Navy Smart Ship USS Yorktown:
    -- The Smart Ship is not the answer, U.S. Naval Institute Proceedings, June 1998
    -- Software glitches leave Navy Smart Ship dead in the water, Government Computer News, July 13, 1998.

Netcraft News and Surveys
Slashdot stories on bugs, viruses and downtime

Last updated: 24 February 2013, Priya Narasimhan