svn is running again. I'm very glad I didn't try and repair it last night, because all of the repositories would have ended up vaporized, as I slumped over the keyboard and triggered the rm -rf / macro I have bound to Ctrl-Alt-2oSDLFHq.
Friday, July 28, 2006
we had another "power event" today, and at least one of the repositories on svn.perl.org ended up with some minor corruption. I need sleep, so I'm going to bed. I'll fix things in the morning. (I don't want to do a rush job tonight and mess something else up.)
Monday, July 24, 2006
Sunday, July 23, 2006
It has not been a good weekend for our datacenter. It all started on Saturday, when Los Angeles experienced record breaking temperatures. (I spent the afternoon outside, and it was a scorcher... 110 degrees plus.) There was a power failure in the building that hosts our datacenter...
the backup power kicked in, but failed after a time period when it got too hot. Our machines lost power briefly, and all but one came back up. Because Murphy's law always takes effect when you least want it to - the machine that didn't come back up was the one that hosts all the perl.org mailing lists. The datacenter personel attempted to reset it, but as they were dealing with many other customers (much more important than us - I don't blame them), they didn't have time to hook a monitor up to our system and see what was going on. So, at 11pm, I drove down to the datacenter to find out that all it wanted was "press F1 to continue". Further diagnosis showed that the bios battery was gone, and the case open sensor kept tripping. Even if the bios was set to not prompt, it would "conveniently" forget that fact. (Did I mention that I was leaving to go to Portland for OSCON in the morning?)
Today, we recieved a note that we may lose power due to some emergency maintenance the building was going to perform to repair electrical damage caused by yesterday's outage. So, instead of having to deal with fscking and rapid power loss, we shut down all of the systems. Severla hours later, we attempted to turn them back on - but only 50% came up! The datacenter staff helped reset the rest, and gave the ornery list box from above the 'f1' treatment. Everything is back up and happy now.
I know that several other companies hosted in the same building lost power, and not just in our datacenter. One, a large perl shop, is still down -- going on six hours. For larger deployments, they are concerned more with heat dissipation - so need to wait for things to cool down. I'm very happy with our hosting arrangements - they've been very helpful with getting boxes reset - and I know things are worse for them than they are for us.
This weekend has identified some weaknesses in our architechture, and we're going to be working over the next few months to solve them. While it doesn't make sense for us to have a fully distributed system, we could definitely use more redundancy in some core systems. We'll probably be posting here with an updated "wish list" soon.
Fingers crossed that the rest of this week goes smoothly. It's no fun having to deal with a datacenter from hundreds of miles away.
The facility we are in needs to turn off the power for 4-6 hours(!!!) starting around 3pm PST to repair the UPS and switch back to utility power. Yikes, really bad timing with the hackaton at OSCON and all. :-(
Our servers will shutdown around 10 minutes before that and hopefully come back when power comes back. One or two might need an extra kick which we'll get done tonight or tomorrow morning.
More when we know more.
(and of course our European search.cpan.org mirror is out too, so we can't even keep that running. Grrrh).
Saturday, July 22, 2006
The perl.org mailing lists are back up. (Thanks to Robert who went to the datacenter to kick it!).
In related news it seems like we could use a new 1U box (with two disks, preferably SCSI) to run the mailing lists. Email firstname.lastname@example.org if you have something to spare.
When the power went out the UPS kicked in (and then the generators), but apparently the HVAC system failed and ~30 minutes later something overheated and shut down our power momentarily. Whee. :-(
Our datacenter had a power-outage including the UPS systems (!). Some things didn't start up properly making other things not starting up properly etc etc.
We are working on it and everything should be back up shortly(-ish). Right now we're waiting for a ~400GB partition on an incredibly slow raid-5 system to run fsck ("/dev/vg1/lv0 has gone 309 days without being checked, check forced.").
Update ~19:40PST: We have almost everything running again. The mailing list server didn't seem to come back after the reboot so no mailing list mail yet. Robert is calling the data center to see if they can put a crash cart on it.
Tuesday, July 18, 2006
Wednesday, July 12, 2006
We posted in May about needing a volunteer to hack up a simple script. Still do!
This time we have a few pointers ready though so hopefully we can get a volunteer started before he or she gets distracted and disappear on us.
As a slightly larger task then we could use some help from someone who'd be interested in hacking on pgeodns, our geographic load balancing name server written in perl.
Email email@example.com if you are interested and I'll get you access to our little Wiki.
Monday, July 3, 2006
Finally, the moment you've all been waiting for is here!
perlbug (http://rt.perl.org/rt3/) has been upgraded to the latest and greatest version (RT 3.6).
Here are some changes you might notice:
- a new shiny look
- no more auth.perl.org, we now authenticate directly from bitcard.org
- a public interface that doesn't require you to log in to see tickets
- a much more powerful search interface
- things that were slow before, are not quite so slow anymore
- saved searches
Likely, you'll discover some things are broken, or don't work the way they used to. Here's a few we know about:
- Old bookmarked searches can't be used anymore. (Sorry!)
- Some bitcard accounts (with accented characters in their names) can't login.
- we have a mild performance issue related to CSS caching.
If you run into issues, big or small, please send an email to perlbug-admin at perl.org. Your message will be answered in the order recieved.
(special thanks to Jesse Vincent, Kevin Riggle, Thomas Sibley, and all the rest of the gang at Best Practical, for their help, patience, and for the rt.cpan.org customizations, which made this much easier than it might have been)