Monday, September 12, 2005

power is back...

The power is back and the console server came up. Now for trying to sort out everything else, hopefully without having to drive there.



update: alright, I got everything beat into running again except for the mailing list server. A few of the other boxes (on really really really old hardware) had a hard time getting started too, but after an extra power cycle and a manual fsck I got them going. x6, the list server, isn't as responsive though. Grrrh.



On the radio they're talking about "45 minutes traffic" from Hollywood to downtown on the 101 (when there's no traffic it takes just 10-15 minutes) so I'm hesitating a little bit running down there to look at it, but I will have to go soon as I can't be holed up down there for too long tonight.



update 2, much too much later: The mailing list mails are flowing again... (and the console redirection on that server is working again; one of the harddrives in the RAID not so much. grrh).



Power outage in Los Angeles, perl.org down

Apparently there's a Really Big power outage here in Los Angeles.



At my house the power flicked enough for the UPS to click in and for things not on the UPS to reboot, but there's power just fine now so I didn't think anything of it.



Ask: Circuit not working. Can't breathe!

Internap NOC: Los Angeles? Big power outage there. Traffic signals not working, police on "full tactical alert" and stuff.

Ask: Uh-oh! Okay.

Ask: Generators?

Internap NOC: They said they came on but shut down due to overload.

Ask: Doh.



Our servers are in the same building but on a different floor, so it's entirely likely that they are affected too. Grrh. I'll have some fun getting everything running again when the power comes back down there (Robert is traveling today).



Thursday, September 8, 2005

Datacenter day - update

That went mostly well. We got two dead servers assembled into one that'll probably work, or maybe not. It was installing slower than slowly over the network. Robert set it up to redirect the BIOS to the serial port and we hooked it into the console server so he'll try installing it from home later. We also hooked the new switch up to the console. Cyclades++.



We got all the disks in the new RAID working. It'll be very very cool when we start actually using them. The hardware RAID-5 with the 120GB disks we are using now isn't performing well at all.



While taking disks out and checking the connectors Robert pulled the disk we thought was an extra extra hot spare for the existing RAID only to minutes later realize it had a swap partition on it. Ooops!



MySQL was apparently swapped a bit because it crashed with a really really long dump to the log and some corrupted MyISAM indexes. All the InnoDB tables came up fine though. When InnoDB first came out I was much more comfortable with MyISAM because how simple it was. I'm only going to use MyISAM in the future when the particular performance characteristics of it are needed.



Datacenter day

Robert and I are in the data center today (afternoon PST) replacing a bad disk or two and moving some equipment around.



Some of the services we run might be unavailable for a few moments once or twice, so don't be alarmed.



We have monitoring setup to alert if when things are not working, but if something is not working by the end of the day (~1am UTC), please let us know. :)



update: over here....



Sunday, September 4, 2005

Create your own "BackPAN"

If you didn't know, "backpan" is all the PAUSE uploads but with no deletions (PAUSE, by Andreas König, is the Perl Authors Upload SErver which is the way the vast majority of content enters CPAN now).



Robert and I run a backpan at backpan.perl.org and Elaine is running an equivalent one at backpan.cpan.org.



We could use more though! It doesn't have to be public, but it'd be nice to know that we had a few more backups of old old CPAN distributions.



First mirror CPAN if you aren't already. Once a day is fine. Also once a day then run something like the following:



rsync --exclude CHECKSUMS -vrptgx ~/mirror/CPAN/authors/id/ ~/mirror/backpan/authors/id/



to copy new files into backpan. Notice there's no --delete argument to that rsync command.



Thursday, August 25, 2005

CPAN Ratings updates

Spent a couple hours this afternoon on CPAN Ratings and fixed and cleaned up a bunch of little things.



One of them was adding a link to the RSS feed. I didn't realize we forgot that when we added it. Oops! :-)



Of course I might have broken something too. Let me know if you see anything not working so well.



It's great that so many people have been "rating reviews" (marking them helpful or not). Next time I work on the site I'll make it do something useful with that data (besides just showing it next to each review).



Sunday, August 14, 2005

Link exchange requests

The perl.org webmaster email address gets link exchange requests from time to time. Today, we got a great one...




Hello,
I have found your website perl.org by searching Google for "greenzap scam". I think our websites has a similar theme, so I have already added your link to my website.
....


Needless to say, we don't respond.



update: check out the comment this entry got. Hilarious. (I deleted their link from it though...) - ask



Monday, July 25, 2005

Thanks To Thalasar

Special thanks to Brian Despain of


Thalasar

for buying us some new hard drives.



We're using the drives as part of our extended disk array to help improve performance and do better backups. It's very cool having a terabyte of disk to play with.



Please visit thalasar.com to learn about what Brian is up to with his open-source company.



Wednesday, July 20, 2005

I fought glibc and won

This morning I woke to a few strange emails from Ask and a ticket that the Parrot TODO list was broken. The gist of it was: getprotobyname was causing our production mod_perl's to crash. (See entry from earlier today for a stacktrace.)



Obviously, not a good thing.



Question One: What changed? Neither Ask nor I could remember changing anything that *should* have affected this recently. The easy solution (revert) was out.


Step one: Reproduce and isolate. This turned out to be the easy part. I configured a bare-bones Combust with only a single website configured with only a single controller:



package Test::Control::Test;
use base 'Combust::Control';
use DBD::mysql;
use LWP::Simple;
sub handler ($$) {
my ($self, $r) = @_;
my $output = LWP::Simple::get("http://www.cnn.com");
$self->send_output(\$output, 'text/html');
}

Without the use DBD::mysql, everything was fine. With it, KABOOM.


Step two: Debug.. By telling apache not to fork, it was easier to track things down. /pkg/apache1/bin/httpd -X -f /home/robert/minisite/apache/conf/httpd.conf GDB wasn't particularly helpful. It got me the stack trace, and told me it was happening during DSO symbol lookup. (Smells like glibc!). Hrm, might be a problem with something scribbling over memory. Lets try Valgrind. No luck, the older version of valgrind we have on the system bombs out.


Ask found this RedHat Bugzilla ticket. It is a similar problem, but didn't get resolved. It did lead me closer to the solution. As suggested in the ticket, I ran my apache with all of the dynamic loader debugging enabled: LD_DEBUG=all LD_DEBUG_OUTPUT=/tmp/some-file . The copious (16MB) output ended like this:



4702: symbol=_nss_files_parse_protoent; lookup in file=/pkg/packages/apache-1.3.33/libexec/mod_setenvif.so
4702: symbol=_nss_files_parse_protoent; lookup in file=/pkg/packages/apache-1.3.33/libexec/libperl.so
4702: symbol=_nss_files_parse_protoent; lookup in file=/lib/libnsl.so.1
4702: symbol=_nss_files_parse_protoent; lookup in file=/lib/libutil.so.1

It should have looked something like this: (from earlier in some-file)



4702: symbol=strlen; lookup in file=/pkg/apache1/bin/httpd
4702: symbol=strlen; lookup in file=/lib/tls/libm.so.6
4702: symbol=strlen; lookup in file=/lib/libcrypt.so.1
4702: symbol=strlen; lookup in file=/usr/lib/libgdbm.so.2
4702: symbol=strlen; lookup in file=/lib/libdl.so.2
4702: symbol=strlen; lookup in file=/lib/tls/libc.so.6
4702: binding file /pkg/packages/apache-1.3.33/libexec/mod_log_config.so to /lib/tls/libc.so.6: normal symbol `strlen' [GLIBC_2.0]

So, we now know that the problem is the dynamic linker is having trouble binding the symbol _nss_files_parse_protoent (which lives in /lib/libnss_files.so) and that has something to do with DBD::mysql. We also know that mysql.so (the C portion of DBD::mysql) is linked against libnss_files. (See ldd output or information from the some-file.)


That struck me as odd, so I attempted rebuilding DBD::mysql to see why it was linking against the nss libraries. That's generally something that should be sucked in by libresolv. Definitely, a general purpose application shouldn't be linking against specific nss ("Name Service Switch") libraries.


Turns out, DBD::mysql was getting the information from mysql_config.


--libs [-L/usr/lib/mysql -lmysqlclient -lz -lcrypt -lnsl -lm -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns -lresolv]

Whoa! Duplication, redundancy, extra libraries, and explicit linking against libc. Definitely not something that most applications should do. I could understand that MySQL itself might need to do weird things - it's a complicated application - but things linking against it shouldn't have to.


Step three: Fix it.


--- mysql_config.old 2005-07-20 16:01:34.000000000 -0700
+++ mysql_config 2005-07-20 16:02:06.000000000 -0700
@@ -86,10 +86,10 @@
# Create options
libs="$ldflags -L$pkglibdir -lmysqlclient -lz -lcrypt -lnsl -lm "
-libs="$libs -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns -lresolv"
+libs="$libs -lc -lresolv"
libs=`echo "$libs" | sed -e 's; \+; ;g' | sed -e 's;^ *;;' | sed -e 's; *\$;;'`
-libs_r="$ldflags -L$pkglibdir -lmysqlclient_r -lz -lpthread -lcrypt -lnsl -lm -lpthread -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns -lresolv "
+libs_r="$ldflags -L$pkglibdir -lmysqlclient_r -lz -lpthread -lcrypt -lnsl -lm -lpthread -lc -lresolv "
libs_r=`echo "$libs_r" | sed -e 's; \+; ;g' | sed -e 's;^ *;;' | sed -e 's; *\$;;'`
cflags="-I$pkgincludedir -O2 -mcpu=i486 -fno-strength-reduce " #note: end space!
include="-I$pkgincludedir"

That's cheating. But it got the job done. After making that change, I rebuild DBD::mysql. By not explicitly linking mysql.so
against -lnss_files, the internal magic of glibc could do the right thing and not blow up.


CPAN Ratings posting works again

Earlier this afternoon Robert got our DBD::mysql recompiled after some mysql_config hackery so it stopped making our httpd crash.



I've made another change to the site too now so you can post a "review" without giving the distribution a rating, effectively making it a comment.  Use that to comment on other peoples reviews, please. :-)



A dozen people have started marking reviews as helpful / not helpful.  It'll be interesting to see how that data set will grow over time.



Tuesday, July 19, 2005

CPAN Ratings broken #2

The slightly updated version of the CPAN Ratings site is up now.



It looks the same, but inside it got a decent house cleaning; and it's now using our new beta-ish authentication server instead of the old hack (we didn't call it the "ducttape API" for nothing!).



That's the good news.  The bad news is that the httpd is coredumping on "getprotobyname".



#0  0x00d5057b in do_lookup_versioned () from /lib/ld-linux.so.2

#1  0x00d4f776 in _dl_lookup_versioned_symbol_internal () from /lib/ld-linux.so.2

#2  0x00d53473 in fixup () from /lib/ld-linux.so.2

#3  0x00d53330 in _dl_runtime_resolve () from /lib/ld-linux.so.2

#4  0x006c76d4 in _nss_files_getprotobyname_r () from /lib/libnss_files.so.2

#5  0x00203752 in getprotobyname_r@@GLIBC_2.1.2 () from /lib/tls/libc.so.6

#6  0x009945a9 in Perl_pp_gprotoent () from /pkg/packages/apache-1.3.33/libexec/libperl.so



I hacked around it in a few places to make the site appear to work (and you can vote reviews as helpful or not helpful!), but you still can't add a new review.   I'd keep hacking on it, but it's 4.30 and have things to do wednesday so I can't stay up much longer... Zzzzzz...



<b>update</b>: it's working again.



search.cpan.org Uploads RSS 1.0 feed

Many people have not been ablt to find the RSS feed for CPAN uploads that search has been providing for quite sometime. The only link was in the FAQ. It was also only in RSS 0.91 and did not contain much information.



Well now there is a new feed at http://search.cpan.org/uploads.rdf which is in the header of each page so discovery tools can find it. And there is a link on the recent upload page



Monday, July 18, 2005

cpanratings.perl.org broken

cpanratings.perl.org is in a bit of a mess.  Posting new reviews has been broken for a while and I just super-broke it.   I was working on a few new features and while I had a separate branch for the code I forgot to do the same for the templates.  Oops.



Instead of reverting it back I'll just press on to get the new stuff out in the next few days.  Thank you for your patience. :-)



update: almost fixed.