In the evening I implemented a 2-second connect timeout and an exponential backoff for the retry timer. First reconnect attempts will happen within seconds, but they will slow down to about 2 minutes between retries. Using a non-blocking connect() would have been the correct fix, but this was a bit quicker. The problem should not appear again in this form.
It seems like no APRS data was lost or missed, it was just collected in a buffer, and processed once the connect attempts started working again. The following graph gives some idea of the relative processing rate changes. At peak about 10 megabytes of data was in the buffer.

No comments:
Post a Comment