I’ve got multiple software watchdogs in place which monitor the different satellites to try to spot issues and to let me know so that I can take corrective action. My assumption is that normally problems are at my end.
An example email was the one I got just over an hour ago to let me know that I’d got a problem with GK-2A reception as it had been 42 minutes since I’d last received an image with my threshold of 40 minutes being exceeded.
I’d got the email on my phone and being home, I went to check on what was happening on the Pi which does the decoding.
The first place to look is the watchdog log files as it captures the processing. First it checks to see if the xrit code (which does the image processing) is running and it also checks if there have been images captured. If needed it’ll first try to resolve things by restarting the xrit code but if that doesn’t work, it takes the more drastic option of rebooting the pi. This hadn’t worked, as we can see from these log entries, which was repeated multiple times as my watchdog tried all it could to resolve the problem.
2021-04-03 14:33:56,958 - watchdog - DEBUG - xrit-rx.py is running
2021-04-03 14:33:57,185 - watchdog - DEBUG - goesrecv is NOT processing images
2021-04-03 14:33:57,185 - watchdog - DEBUG - goesrecv is not processing
2021-04-03 14:33:57,204 - watchdog - DEBUG - rebooting the Pi
Looking at the receive log I could see that there were no issues with the down link with a vit(avg) of just under 40 (under about 200 is fine), multiple packets being received and zero drops. With this combination showing no issues with the down link, so the antenna / GOES SAWbird / RTL-SDR / Raspberry Pi combination working well.
However looking at the process log, all I saw was that the connection was idle.
Idle means that there is no data which can be decrypted.
Jumping onto a Discord channel where I connect with some other people who download satellite data, I was hoping to see if someone else was able to receive the data to help me troubleshoot the problem, but the only person with a GK-2A receiver had a problem with the power supply for their Pi, so I was on my own.
I looked at the web portal which shows the schedule and noticed something odd, with the FD 001 image being scheduled for 01:50:06 UTC, which is significantly later than normal. I validated this by checking for when it was received yesterday, which was 00:12 UTC for the file time stamp, which is about normal. So today’s 001 was expected about 100 minutes later than normal.
Watching the process log, I saw an image arrive, which was as per the schedule (first line), so I knew that things were at least returning to normal with an expected file arriving as per the schedule.
However FD001 never arrived on time, so something wasn’t totally right.
Looking at the last full disk (FD) file which arrived, it was clear that whilst it had started to be received, only a partial image was received at 00:20 UTC. But from the schedule, FD002 shouldn’t be received until 02:00:56 UTC, which didn’t make sense.
I was prompted to check the .ant file, which is a file that is downloaded daily that has information including expected outages, but there had been no file downloaded today and the one from the previous day had nothing notable.
Continuing to watch the process log, I saw an image start to be received, which was FD002 for the second time (which never usually happens as the image number always increments.
And there were two FD002 images received, which is the first time I’ve ever seen that happen.
Keen observers will note that there was a FD143 file however whilst that was scanned by the satellite the previous day, by the time it has been processed on the satellite and downloaded, it appears in the directory for the next day, so this is perfectly normal.
Looking at the new FD002, everything looks to be back to normal.
So for now, it seems to be back to working normally, with multiple images downloading with zero problems.
Now the more complex question is what was the root cause of the issue. Given that there was a long break with no images downloaded, no issues found with my reception / decoding, it seems unlikely that the issue is with my hardware / software. But when you consider the unusual schedule for the FD images, the partial reception of FD002 and then another FD002 being received much later, the most likely explanation is that something happened at the GK-2A satellite end, but unless the GK-2A satellite operator, the Korea Meteorological Administration (KMA), disclose an issue it will probably remain a mystery.
However it was fun trying to debug this in real time, not knowing the root cause of the problem and expecting that it would be something on my end that had gone wrong. But then I’ve seen issues with satellites before including NOAA 15 and GK-2A.
And the watchdog confirmed everything was back to normal.