Tuesday, March 24, 2009

Some database cleanups

Since the popularity of the site is increasing I need to do some optimizations to keep the site running smoothly.

As announced before, I ran some database restructuring batch jobs during the weekend. The last one was actually started on Monday morning and the execution was completed early Tuesday morning. I hope the slowdowns were not too noticeable, I tried to make them run slowly enough to not interfere with normal use.

I'm currently looking at ways to clean up some unused data from the aprs.fi database. There's quite a lot of automatically or erroneously generated targets stored in the database, most of which are not really looked at by anyone. They just inflate my tables and make the prefix browsing view very cluttered. The performance impact caused by the extra targets isn't really noticeable, but it's just not very nice to have them there.

If you're keeping an eye on the statistics, you might have noticed that today I deleted about 60000 targets. Those were all APRS objects, which had only a single point stored (non-moving), which were last announced over a week ago, and which were announced for less than 12 hours. Almost all of them were automatically generated earthquake and severe weather warning objects, which don't really need to be here any more.

There's also an awful lot of regular APRS targets which have a corrupted source callsign. Most usually one or two characters have been lost from the beginning of the callsign. For example: 1VAJ (J41VAJ), 3GXT-2 (W3GXT-2) and 3TVX-9 (VE3TVX-9). You can find these by looking at the prefix browsing view, picking a strange prefix starting with a number, and looking up the correct original call in the "nearby stations" list of the info page. I wonder how the callsigns get mangled like this, and how could I remove these from the database without accidentally removing some valid data.


Maescool said...

aren't there regexes to check the calls.. or wasn't there a list to check the validity.. like the programs we use on contests :)
+ wasn't the standard norm to just store for 10 days?
PS. check out memcache for optimalisation ;) i use it on a high traffic site, brings the load way down!

Hessu said...

Maescool: There are a lot of APRS items and objects, and tactical callsigns (like airplane/ship callsigns), in completely valid use, so a regexp to only match amateur callsigns is out of the question.

10 days? What standard? :) aprs.fi is currently storing a year's worth of history, which is a nice feature.

I'm already using memcache for some tasks, of course, but it's a bit tricky to use for finding a set of targets in a specific area. It's very quick because it doesn't provide things like range searches. Most of my load is currently caused by writes (inserts, updates, deletes), the reads are mostly cached already, one way or another.

Maescool said...

-oh didn't think of that :$

-Wasn't that the standard? on the aprs network? or am i so wrong? i remember reading that somewhere a couple of years ago.. when i first started experimenting on aprs

-great to hear :)
I'm also struggling to make caching better coordinates based.. not easy..
(not aprs.. google maps and some media data..)

further i see you have a cisco/juniper network.. that's the best ;)


Hessu said...

10 days might have been a limitation of some other implementation, or maybe a specification by Bob. But I think it's a better feature to store for a longer time and let the user decide which time range to view.

(M series.)

LA3QMA said...

Strange callsign is one problem.

But i also see callsigns like "WIDE" etc

And people using UI-View using the unproto wrong.
UI-View has to use the word APRS before the path. So if you see things like WIDE1-1 via WIDE2-2 instead of APU25N via WIDE1-1,WIDE2-2 then this is most likely a ui-view user.

So instead of dropping those packets is it possible to make a link with a list of non valid packets? Or do you just want them out of the sql database?

Hessu said...

LA3QMA: Strange destination callsigns (like a digipeater path in the place of the destination call) are not much of a problem for aprs.fi, it only causes problems for the individual's transmitted packets if the digipeater path is wrong.

Corrupted source callsigns are more of a problem, since they clutter up the prefix browsing view, and generate a lot of rows in the database, rows which are never really referenced. And it's a bit hard to differentiate them from valid short-lived objects.

Anonymous said...

Maybe a bit on the side, but please enlight me:
I have a couple of objects I want to get rid of, but how do I do it?
test5oea, TEST5OEA2, Halten2 - do they time out in some way? The objects are deleted in my Xastir map, but still show up in aprs.fi.

Btw. do you have a way of reporting bugs?
I think I found a minor bug in the Status view: the status for my electric car (http://aprs.fi/?c=status&call=LA5OEA-5) shows the time twice instead of date and time (I guess) e.g.:
15:54:56 - 15:54:56 UTC: LA5OEA-5: 3595Km; 58%; 41.2DAh;12.50...

W6DTW said...

Maybe I'm being draconian, but I suggest you remove the corrupted callsigns from the database, since they're not technically "valid data" anyway.