GK-2A Troubleshooting

Overnight there was an issue which prevented the GK-2A images being downloaded and / or processed. I spotted this when checking to see what the latest image looked like and noticed that the images were last updated at 8.34pm the night before.

So it was time to start troubleshooting to learn what had gone wrong so, if possible, I can prevent it happening again.

First test is to see if I can connect to the Raspberry Pi used and found that I could SSH (Secure SHell) into in. So that meant that the Pi was up and running.

Going to where the process logging was collected, I looked at the end of the log:

Logfile data

Here I can see that the full disk image was being downloaded and it had processed 4 of the 10 slices that the image is broken into for downloading, then it just stopped.

Checking for the image file, this was not generated at all.

I also did a quick check to make sure that the dish was correctly pointing at the satellite. After all something could have happened to either the satellite or the dish might not be pointed correctly.

Receive log

The key numbers to look at here are:

  • vit(avg) – this is a measure of the quality of the data and a number under about 400 is enough to get good reception, so values in the range 61-70 are excellent. So the satellite is broadcasting and it is being received well.
  • packets – this shows how many data packets are being received with values around 7-8 showing everything is working as expected.
  • drops – this shows how many of the received packets could not be processed, so all being 0 shows everything is fine.

So everything looks right, but no images. However I ruled out issues with how it is being received now.

So with the abrupt stop whilst processing an image, it looks like there is an issue which happened whilst processing the data to create the image using the xrit-rx application.

I looked, but couldn’t find any application logging from the xrit-rx application, so all I can learn is that an unknown issue caused it to crash after processing slice 4 and before it could write out even a partial image.

The next step was to reboot the Pi and to see if this started everything off and data could be downloaded plus processed. So waiting for the next image to start to be downloaded:

Logfile data

So yes it is back to working and a full image was received, I was just a bit impatient to see if something was working so I checked before the image had completed.

And the all important check is to see if it resulted in a file being created, which it did.

Full Disk Image

So, how to prevent this from happening again?

The first thing I did was rather crude, but it will prevent any issue losing too much data. So I added a cron job so that every 12 hours, at 5 minutes past the hour (00:05 and 12:05) it will reboot. This time was selected so that it is not going to do so in the middle of receiving an FD image and from the schedule for downloads, no other data is normally scheduled to be downloaded at that point in time.

So this means that the most data I’ll lose will be 12 hours and on average around 6 hours. Not ideal, but better than waiting for when I notice it.

Next I need to figure out how to identify the problem earlier.

First thoughts are to ensure that the apps which are doing the decoding / processing are running and if not, restart them.

So for xrit-rc:

xrit-rc process

So here we can see that it spots two processes, one is the actual process and the second is the grep command looking for it. So if that command returns two lines, it is working, otherwise just one and it is not, so it needs to be restarted.

And the second is goesrecv:

goesrecv process

Here I was expecting to just see two lines returned, but I see one running as root and one as the pi user. And the files created have the user / group set to pi, so I suspect that the process running as root shouldn’t be. Maybe this could be behind the issue?

Will investigate and see what I can find…

Leave a Reply