Monday, July 25, 2005

Thanks To Thalasar

Special thanks to Brian Despain of


Thalasar

for buying us some new hard drives.



We're using the drives as part of our extended disk array to help improve performance and do better backups. It's very cool having a terabyte of disk to play with.



Please visit thalasar.com to learn about what Brian is up to with his open-source company.



Wednesday, July 20, 2005

I fought glibc and won

This morning I woke to a few strange emails from Ask and a ticket that the Parrot TODO list was broken. The gist of it was: getprotobyname was causing our production mod_perl's to crash. (See entry from earlier today for a stacktrace.)



Obviously, not a good thing.



Question One: What changed? Neither Ask nor I could remember changing anything that *should* have affected this recently. The easy solution (revert) was out.


Step one: Reproduce and isolate. This turned out to be the easy part. I configured a bare-bones Combust with only a single website configured with only a single controller:



package Test::Control::Test;
use base 'Combust::Control';
use DBD::mysql;
use LWP::Simple;
sub handler ($$) {
my ($self, $r) = @_;
my $output = LWP::Simple::get("http://www.cnn.com");
$self->send_output(\$output, 'text/html');
}

Without the use DBD::mysql, everything was fine. With it, KABOOM.


Step two: Debug.. By telling apache not to fork, it was easier to track things down. /pkg/apache1/bin/httpd -X -f /home/robert/minisite/apache/conf/httpd.conf GDB wasn't particularly helpful. It got me the stack trace, and told me it was happening during DSO symbol lookup. (Smells like glibc!). Hrm, might be a problem with something scribbling over memory. Lets try Valgrind. No luck, the older version of valgrind we have on the system bombs out.


Ask found this RedHat Bugzilla ticket. It is a similar problem, but didn't get resolved. It did lead me closer to the solution. As suggested in the ticket, I ran my apache with all of the dynamic loader debugging enabled: LD_DEBUG=all LD_DEBUG_OUTPUT=/tmp/some-file . The copious (16MB) output ended like this:



4702: symbol=_nss_files_parse_protoent; lookup in file=/pkg/packages/apache-1.3.33/libexec/mod_setenvif.so
4702: symbol=_nss_files_parse_protoent; lookup in file=/pkg/packages/apache-1.3.33/libexec/libperl.so
4702: symbol=_nss_files_parse_protoent; lookup in file=/lib/libnsl.so.1
4702: symbol=_nss_files_parse_protoent; lookup in file=/lib/libutil.so.1

It should have looked something like this: (from earlier in some-file)



4702: symbol=strlen; lookup in file=/pkg/apache1/bin/httpd
4702: symbol=strlen; lookup in file=/lib/tls/libm.so.6
4702: symbol=strlen; lookup in file=/lib/libcrypt.so.1
4702: symbol=strlen; lookup in file=/usr/lib/libgdbm.so.2
4702: symbol=strlen; lookup in file=/lib/libdl.so.2
4702: symbol=strlen; lookup in file=/lib/tls/libc.so.6
4702: binding file /pkg/packages/apache-1.3.33/libexec/mod_log_config.so to /lib/tls/libc.so.6: normal symbol `strlen' [GLIBC_2.0]

So, we now know that the problem is the dynamic linker is having trouble binding the symbol _nss_files_parse_protoent (which lives in /lib/libnss_files.so) and that has something to do with DBD::mysql. We also know that mysql.so (the C portion of DBD::mysql) is linked against libnss_files. (See ldd output or information from the some-file.)


That struck me as odd, so I attempted rebuilding DBD::mysql to see why it was linking against the nss libraries. That's generally something that should be sucked in by libresolv. Definitely, a general purpose application shouldn't be linking against specific nss ("Name Service Switch") libraries.


Turns out, DBD::mysql was getting the information from mysql_config.


--libs [-L/usr/lib/mysql -lmysqlclient -lz -lcrypt -lnsl -lm -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns -lresolv]

Whoa! Duplication, redundancy, extra libraries, and explicit linking against libc. Definitely not something that most applications should do. I could understand that MySQL itself might need to do weird things - it's a complicated application - but things linking against it shouldn't have to.


Step three: Fix it.


--- mysql_config.old 2005-07-20 16:01:34.000000000 -0700
+++ mysql_config 2005-07-20 16:02:06.000000000 -0700
@@ -86,10 +86,10 @@
# Create options
libs="$ldflags -L$pkglibdir -lmysqlclient -lz -lcrypt -lnsl -lm "
-libs="$libs -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns -lresolv"
+libs="$libs -lc -lresolv"
libs=`echo "$libs" | sed -e 's; \+; ;g' | sed -e 's;^ *;;' | sed -e 's; *\$;;'`
-libs_r="$ldflags -L$pkglibdir -lmysqlclient_r -lz -lpthread -lcrypt -lnsl -lm -lpthread -lc -lnss_files -lnss_dns -lresolv -lc -lnss_files -lnss_dns -lresolv "
+libs_r="$ldflags -L$pkglibdir -lmysqlclient_r -lz -lpthread -lcrypt -lnsl -lm -lpthread -lc -lresolv "
libs_r=`echo "$libs_r" | sed -e 's; \+; ;g' | sed -e 's;^ *;;' | sed -e 's; *\$;;'`
cflags="-I$pkgincludedir -O2 -mcpu=i486 -fno-strength-reduce " #note: end space!
include="-I$pkgincludedir"

That's cheating. But it got the job done. After making that change, I rebuild DBD::mysql. By not explicitly linking mysql.so
against -lnss_files, the internal magic of glibc could do the right thing and not blow up.


CPAN Ratings posting works again

Earlier this afternoon Robert got our DBD::mysql recompiled after some mysql_config hackery so it stopped making our httpd crash.



I've made another change to the site too now so you can post a "review" without giving the distribution a rating, effectively making it a comment.  Use that to comment on other peoples reviews, please. :-)



A dozen people have started marking reviews as helpful / not helpful.  It'll be interesting to see how that data set will grow over time.



Tuesday, July 19, 2005

CPAN Ratings broken #2

The slightly updated version of the CPAN Ratings site is up now.



It looks the same, but inside it got a decent house cleaning; and it's now using our new beta-ish authentication server instead of the old hack (we didn't call it the "ducttape API" for nothing!).



That's the good news.  The bad news is that the httpd is coredumping on "getprotobyname".



#0  0x00d5057b in do_lookup_versioned () from /lib/ld-linux.so.2

#1  0x00d4f776 in _dl_lookup_versioned_symbol_internal () from /lib/ld-linux.so.2

#2  0x00d53473 in fixup () from /lib/ld-linux.so.2

#3  0x00d53330 in _dl_runtime_resolve () from /lib/ld-linux.so.2

#4  0x006c76d4 in _nss_files_getprotobyname_r () from /lib/libnss_files.so.2

#5  0x00203752 in getprotobyname_r@@GLIBC_2.1.2 () from /lib/tls/libc.so.6

#6  0x009945a9 in Perl_pp_gprotoent () from /pkg/packages/apache-1.3.33/libexec/libperl.so



I hacked around it in a few places to make the site appear to work (and you can vote reviews as helpful or not helpful!), but you still can't add a new review.   I'd keep hacking on it, but it's 4.30 and have things to do wednesday so I can't stay up much longer... Zzzzzz...



<b>update</b>: it's working again.



search.cpan.org Uploads RSS 1.0 feed

Many people have not been ablt to find the RSS feed for CPAN uploads that search has been providing for quite sometime. The only link was in the FAQ. It was also only in RSS 0.91 and did not contain much information.



Well now there is a new feed at http://search.cpan.org/uploads.rdf which is in the header of each page so discovery tools can find it. And there is a link on the recent upload page



Monday, July 18, 2005

cpanratings.perl.org broken

cpanratings.perl.org is in a bit of a mess.  Posting new reviews has been broken for a while and I just super-broke it.   I was working on a few new features and while I had a separate branch for the code I forgot to do the same for the templates.  Oops.



Instead of reverting it back I'll just press on to get the new stuff out in the next few days.  Thank you for your patience. :-)



update: almost fixed.



Wednesday, July 6, 2005

Subversion, RT, CVS and mail (not anymore) down

Oops. Our ancient server that's hosting SVN, CVS, RT and does our incoming mail filtering crashed.



We have plans to get Subversion and CVS moved to a dedicated box, but we haven't gotten to it yet. (Likewise for Request Tracker).



There was no output on the terminal server, so I power cycled it and it's rebooting now. Slooowly.



/home: Clearing orphaned inode 1197164 (uid=27, gid=27, mode=0100600, size=0)

/home: clean, 189065/1359872 files, 2282241/2714977 blocks

/usr: recovering journal

/usr: clean, 74363/320640 files, 388588/640591 blocks

/var: recovering journal

/var: Clearing orphaned inode 160130 (uid=101, gid=102, mode=0100744, size=16775271)

/var: clean, 2854/192000 files, 269571/383551 blocks



<-------------reiserfsck, 2001------------->

reiserfsprogs 3.x.0j

/dev/hda1: recovering journal

/dev/hda1 has gone 268 days without being checked, check forced.



update 10:05 PST: it's a minute from being back up now, whee.



Friday, July 1, 2005

Search perl.org - help needed

Some weeks ago I made a simple page to search stuff on *.perl.org. The reason we haven't told anyone or even made a link to the page is that it's Really Ugly. See for example a search for DBI.



The template for the search is in this file, there's also a file with a few lines of extra CSS. Use your perl.org login to access those files. Patches more than welcome!



It's using the XML interface to CPAN Search for the CPAN search results and the Yahoo Search API (via Yahoo::Search). Oh, which reminds me – the search results page needs to mention it's using the yahoo search API.



Did I say that patches are welcome?