Internet: Cloud: Katastrophen

Die zehn schlimmsten Cloud-Katastrophen (Artikel vom 08.11.2012 in der Computerwoche)

Bericht von Amazon über den Ausfall im August



Ausschnitte und Kommentare

07.08.2011 10:41 Ausfall der 110 kV Stromversorgung wegen Blitzschlag. Die Elektronische Umschaltung auf Notstrom hat versagt. Die Akkus der USV waren nach kurzer Zeit leer.
07.08.2011 11:05 Gab es die ersten Ausfälle (At 11:05 AM PDT, we were seeing launch delays and API errors in all EU West Availability Zones.)
07.08.2011 11:54 Einige Notstromgeneratoren konnten in Betrieb gehen, indem sie manuell synchronisiert wurden (At 11:54 AM PDT, we had been able to bring some of the backup generators online by manually phase-synchronizing the power sources. This restored power to many of the EC2 instances and EBS volumes, but a majority of the networking gear was still without power, so these restored instances were still inaccessible.)
07.08.2011 13:49 waren genügend Netzwerkgeräte mit Strom versorgt, so dass die Verbindung zum Internet wiederhergestellt warden konnte. (By 1:49 PM PDT, power had been restored to enough of our network devices that we were able to re-establish connectivity to the Availability Zone.)
07.08.2011   Viele Server versuchten gleichzeitig, die Daten untereinander zu synchronisieren. Dafür reichte die Kapazität nicht aus. (We ran out of spare capacity before all of the volumes were able to successfully re-mirror.)
09.08.2011 06:04 38 % der Kundendaten waren wiederhergestellt. (By 6:04 AM PDT on August 9th, we had delivered approximately 38% of the recovery snapshots for these potentially inconsistent volumes to customers. By 2:37 AM PDT on August 10th, 85% of the recovery snapshots had been delivered. By 8:25 PM PDT on August 10th, we were 98% complete, with the remaining few snapshots requiring manual attention.
10.08.2011 02:37 85 % der Kundendaten waren wiederhergestellt.
10.08.2011 20:25 98 % der Kundendaten waren wiederhergestellt.
EBS Software Bug Impacting Snapshots
Separately, and independent from issues emanating from the power disruption, we discovered an error in the EBS software that cleans up unused storage for snapshots after customers have deleted an EBS snapshot.
Die vorhandenen Sicherheitskopien waren unbrauchbar: In one of the days leading up to the Friday, August 5th deletion run, there was a hardware failure that the snapshot cleanup identification software did not correctly detect and handle. The result was that the list of snapshot references used as input to the cleanup process was incomplete. … The human checks in this process failed to detect the error.
We learned a number of lessons from this event.

Additionally, any customers impacted by the EBS software bug that accidentally deleted blocks in their snapshots will receive a 30 day credit for 100% of their EBS usage.

Alle Zitate aus:


MS hatte ebenfalls Stromausfall, denn es wurde vom selben Stromversorger wie Amazon versorgt. MS konnte die Generatoren nur von Hand synchronisieren (MS verwendete für die Notstromgeneratoren die gleiche Steuerelektronik wie Amazon). Gutschrift ¼ Monatsrechnung.


Das Datenschutzgesetz wird meist nicht eingehalten


Beauftragung von Subunternehmen

