February 28, 2017 AWS S3 downtime

Amazon Web Services had downtime yesterday. Their S3 storage service had what was described as elevated error rates in the US-East zone, and as a result failures started in other AWS services in US-East that depended on S3.

The failures were reminiscent of the DynamoDB downtime of September 20, 2015, in that seemingly unrelated services share a common failure mode because of unforeseen interdependencies.

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. Leslie Lamport, 1987

The current Techmeme roundup has a good summary of the impacts on the net. Notably, Twitter was unaffected, but some elements of Slack like file sharing failed to work. And crucially Amazon’s own ability to communicate with its many customers was the first system to be noticably affected; their status page showed all green checkmarks for more than an hour after the problem was noticed, with the explanation coming that the status page was itself dependent on services that were down.

Some news media sites were also down or running in impaired service, prompting the Associated Press wire service to use Facebook to issue its news story about the AWS outage.