Friday, March 21, 2008

On duplicate and delayed packets

I've had a few questions about how my APRS site detects duplicate packets and position packets which arrive in the wrong order. I've responded to the questions over email, but maybe it'd be useful to write something down here for a wider audience.

I will present some background first, most of which is not news to anyone who has followed aprssig or APRS-IS traffic for a while.

Packets sent by APRS stations are usually received by multiple igates. A packet might be heard directly by one igate, and it might be heard much later again after a few digipeater hops. As a result, the APRS-IS network servers will receive multiple copies of each packet. To reduce load on the network, they try to catch the duplicates and discard them. javAPRSSrvr stores packets for 30 seconds and discards any duplicates received during that time.

But some duplicates still get through, because sometimes it takes more than 30 seconds for the packets to get to APRS-IS. Here's a couple scenarios causing this:

1. There are some broken digipeaters out there, which can buffer packets for quite some time before retransmitting them, due to a busy channel or a broken squelch/DCD circuit. A digipeater should not buffer a packet for more than a few seconds before transmitting it. If it can't spit the packet out rather quickly, it should discard the packet.

2. There are some igates which seem to buffer the packets before getting them on the APRS-IS. This is generally not a bug in the igate software or a problem in it's configuration. Instead, the delay can be caused by congestion or packet loss on the Internet link between the igate and the APRS-IS server it is connecting to.

If there is even little packet loss (due to congestion or a bad ADSL line, for example), the TCP connection between the igate and the APRS-IS server will do retransmissions to make sure everything gets through. When packets are retransmitted, the operating system's TCP stack will assume that the link is congested, and to reduce the congestion, it will slow down it's transmission rate (back off). Exponential retransmit timer backoff is used - if multiple transmissions are lost, the retransmit timer will be doubled, up to a fixed limit (which is usually around 60 seconds). As a result, it can take a minutes for an APRS packet to get to the APRS-IS from an igate. (I simplified the retransmission timer stuff on purpose, congestion control is actually more complicated than that nowadays, but the basic issue remans.)

This is how TCP works - it is doing exactly what it was designed to do. Maybe we should be using UDP on the igate uplinks, or SCTP (on platforms which support it). I personally think we can live with losing some packets on the Internet, since we're losing much more packets on the radio path anyway. When TCP is used, maybe there could be a ping-pong method to measure the latency between the server and the igate, and to drop the link if the round-trip time goes anywhere near 20 seconds. This would, of course, eat some more bandwidth.

The visible effect of delayed packets is that a packet transmitted from a moving car 64 seconds ago might arrive at later than a packet that was transmitted 4 seconds ago. If the car was moving at 100 km/h (27.8 m/s, 62 MPH), and that packet would go unnoticed and would be stored to the database as the latest good position, the car would suddenly jump back 1.6 km on it's track. When the next packet would be received, it'd quickly jump forward to it's current position.

Without additional filtering the above scenario is met very often, and it looks very ugly on the map. The hard part filtering the old packets and duplicates without losing too much valid data.

APRS position packets don't usually have timestamps. Some packets do have timestamps, but even then, the timestamps are not very usable - they are sometimes lacking a date, they are often sent in some other timezone than UTC (and there's no way to know which), and even though a GPS unit would provide very accurate time for free, the clocks of many APRS stations are still off (many are transmitting a fixed location without a GPS unit). It's a bit hard to tell whether a timestamp is usable or not, so is not currently using them for anything. It's simply using the time the APRS packet was received from the APRS-IS.

Since the timestamps are not very usable, it is hard to tell the original order they were transmitted in. If the packets would have good timestamps or sequence numbers, sorting them afterwards (in the database) might still have a significant performance impact. So, for now, I'm using them in the order they were received.

Here's what is currently doing:
  1. Packets having a latitude or longitude of 0 are dropped (usually a bad GPS fix).
  2. The distance between the previously accepted position and the new position is calculated. If it's less than one meter, the "last seen" timestamp of the previous position is updated and no new point is inserted in the database.
  3. If the previous packet was received within 5 seconds, the new packet is discarded.
  4. The speed required to move from the position indicated by the previously received packet to the position indicated by the new packet is calculated. Arrival times of the packets are used for the calculation, so this calculation does not accurately reflect the true ground speed of the target. If the calculated speed is over 500 km/h (311 MPH), the new packet is discarded. This gets rid of a lot of jumps caused by bad GPS fixes and out-of-order packets.
  5. The latitude,longitude,course,altitude set of the new packet is compared to the sender's previously accepted packets over the last 30 minutes. If an exact match is found, the new position is dropped as a duplicate. This step catches duplicates, while the previous step gets packets which are delayed or just far off.
These tricks (especially the speed limit) have some drawbacks. If you have a tracker on a jet aircraft or a satellite, you're out of luck. But I don't think any of the APRS satellites announce their position anyway - since they're moving very fast, they'd have to transmit the position very often for it to be useful, and it's probably better to use the channel bandwidth for relaying packets from ground stations and calculate the position of the satellite using tracking software.

If you fly overseas on a commercial jet and restart your tracker soon after a flight, it'll take some time before your new position is accepted. Typical passenger aircraft has a cruising speed of 800-900 km/h, so after a 3-hour flight it can take another 3 hours before you'll be on again.

The above algorithm is a compromise. It looses some good data, but for most of the time it seems to get rid of a lot of old data and clean up the map display a lot. It can be tuned in either direction, but with the current protocols it cannot be made perfect.

There are a couple things you can do to reduce the risk of duplicate and out-of-order packets from your tracker:
  • Do not use a very long digipeater path. A couple hops (WIDE1-1,WIDE2-1 or WIDE2-2, or whatever is recommended in your country) should usually be enough. Using a longer path will increase the possibility of out-of-order packets due to a longer, variable time spent during the digipeating.
  • Don't send packets very often. It's usually enough to send packets once per minute or two when you're moving, and once per 20-40 minutes when you're not. Smart Beaconing and proportional pathing are good things.
If you run an igate, please use the ping command to determine that your Internet uplink does not have any significant packet loss.

If you managed to read this far - thank you for your attention!


Keith VE7GDH said...

One other cause of delays is a KPC3+ being used in KISS mode by an IGate. It has been proven that after a week or so, it can introduce delays of about 1 1/2 minutes or so.

Andrew Sutherland said...

Hi Heikki,

Thank you for this information, it's helped me understand further. I realize this post is quite old (2009), I was wondering if the TIMESTAMP is still ignored by or if it's now being used. My TinyTrak4 has the ability to send a TIMESTAMP and to format the time and specify the time zone etc.

I'm currently getting the "B - C - A - D" incorrect "jumping" around on the map you described above. I wasn't getting it earlier this month, I BELIEVE it may have started when I decreased the wait between packet sends. I'm going to try reducing and trying again. I'm currently using SmartBeaconing in an automobile, and my paths are often twisty (due to mountainous backroads)

thanks kindly!