fd.o services outage, annarchy $HOME lost

Daniel Stone daniel at fooishbar.org
Wed Sep 30 15:34:47 PDT 2009


Hi all,
Cutting and pasting from my blog entry[0] because I'm lazy:

> As ajax quite elegantly summed up[1], due to a series of catastrophic
> power failures at PSU, where fd.o is hosted, we were down for a good
> chunk of yesterday. Despite the machines having redundant power
> supplies, being connected to separate power rails in the rack, which
> were hooked up to independent, UPS-backed, power supplies, we still
> (like a good chunk of Portland, and certainly everyone in the PSU
> machine room) lost our power.
> 
> As far as we can tell, when annarchy.fd.o (websites, people.fd.o, cgit,
> anongit, et al) came back up, power was again interrupted while the
> ext3 journal was being replayed. When it came up the n'th time, fsck
> dumped almost the entire filesystem in lost+found, then started saying
> increasingly unhappy things about the state of the filesystem on its
> second pass. In the end, we just went with mkfs, and now we have a
> brand new and shiny filesystem.
> 
> It's worth pointing out that even if this was another filesystem, such
> as /srv, which hosts all project data, we would've been fine, as
> they're all backed up. But, unfortunately for some, we made a decision
> a while ago to not back /home up, and didn't advertise that as widely as
> we should have. So, if you had stuff in annarchy:/home, it's now gone,
> and I hope you have backups.
> 
> Sorry about that. On the upside, I got to see PSU's new and really very
> nice machine room this morning, thanks to XDC being about 250m away
> from the PSU machine room, and fd.o is otherwise running fine. We've
> been talking this week about replacing our ageing hardware, which would
> also allow for more redundancy as well as better performance from those
> machines. But we still have no plans to back up /home, so if you put
> stuff there, please, please keep your own backups (or make sure the
> Wayback Machine knows about it). 

Again, please accept our apologies.  The decision was made a long time
ago to not back up $HOME, and it was mentioned a few times, but
certainly not documented nearly as widely as it should've been (i.e.
shouted from the rooftops), given that a few people have lost data and
been upset about it.

The plan for when we get new hardware is to run the old hardware as
redundant backups, with regular rsyncs, so we can fail over and have
less downtime, as well as have a backup (of sorts) of $HOME.

Hopefully this doesn't put you guys out too much, and thanks for your
understanding.

Cheers,
Daniel

[1]: http://ajaxxx.livejournal.com/62015.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/dbus/attachments/20090930/fe0f3988/attachment.pgp 


More information about the dbus mailing list