Thursday, January 24, 2008

The 2-hour outage last night

After adding 4G of memory on the server (which went well, as usual) I had to do some disk arrangements, which took some time to complete. After booting up the Apache web server didn't return web pages correctly any more - web browsers would show the HTML source code instead of nicely formatted pages. After playing with a network analyzer (thanks Wireshark) and getting a little help from an ex-colleague I finally got it fixed.

I had upgraded a good bunch of libraries on the system earlier, but hadn't really restarted the web server processes afterwards (I'm only running a graceful restart after log rotation and config changes), so the web server was still using the old libraries. Now, after the reboot, new libraries were linked in, and it broke. There was an extra line break (\r\n) after the Server: Apache... HTTP header, and web browsers promptly stopped parsing the headers, and assumed a text/plain content type, and showed the HTML.

The strange thing was that enabling content-encoding compression (SetOutputFilter DEFLATE) fixed the problem. While it did work, the header was:

Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8g

While it didn't work, the header was:

Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6
(+ the extra CRLF, then Connection: close and the following headers)

The workaround was to set ServerTokens Prod, so that the header only says Apache, and doesn't give any hints of version numbers or modules. My best guess is that disabling DEFLATE changed library linking order so that the openssl version query returned "\r\n". gnutls maybe?

The sysadmin lesson to learn here is probably that you should restart services depending on libraries which you have upgraded, so that incompatibility is revealed right away. It's nice to know which upgrade broke what. It's not fun to see applications break after "just a reboot".

On the positive side, the system isn't swapping any more.


Anonymous said...

I pulled out my APRS radio after not using for many weeks/months and got to looking around on, it looks like there is one point for each day of the month for the last few months on a call sign that has not been used during that time. When you choose one of those points to see where it is nothing shows up on the map. FYI - N7ZMR-7

Hessu said...

This relates to how stores position history data. The database contains a row for each new position, and for each new position, two timestamps are stored - the time when the station first reported from this new position, and the last time it reported from the position before moving to a new one.

This saves a lot of storage space, and speeds things up, since a lot of stations never move. So, only a single row is stored for those stations. And the moving stations often spend a while in the same location before moving again.

The way this shows in the date browsing menu of the map is not good - it shows one point for each day between the "start" and "end" times. But I'm not sure what would be the correct thing to show there. To it's best knowledge, the station has been at that location on that day.

If nothing shows up on the map when clicked - that's a bug. I'll try to get that fixed soon.