-
Notifications
You must be signed in to change notification settings - Fork 26
Description
This issue documents a problem report that first was reported on www.thethingsnetwork.org/forums: Latest MultiTech Packet Forwarder Stops Sending Packets.
The brief form: after updating to V3.0.0-r5 from V2.x, we observe hangs of the packet router code. Easiest way to detect this is to look at the timestamp on /var/log/lora-pkt-fwd.log:
root@mtcdt:~# ls -lrt /var/log/lora-pkt-fwd.log
-rw-r--r-- 1 root root 384148 Jun 19 12:31 /var/log/lora-pkt-fwd.log
root@mtcdt:~# date
Mon Jun 19 12:53:10 CST 2017(Note the difference in time.)
The forum thread may describe additional issues; this thread only documents those observed by me. I have seen symptoms like this on other gateways, but only finally caught things in the act yesterday.
Status of the packet forwarder (from ps au) was:
516 ? Sl 34:01 /opt/lora/mp_pkt_fwd -c /var/config/lora -l /var/log/
I believe that the Sl flags are significant.
Restarting the packet forwarder does not clear the problem.
root@mtcdt:~# /etc/init.d/ttn-pkt-forwarder restart
Stopping ttn-packet-forwarder: OK
Found MTAC-LORA-915 with MTAC-LORA-1.0 hardware
Starting ttn-packet-forwarder: OK
root@mtcdt:~#The relevant portion of /var/log/lora-packet-fwd.log (after the above restart) was:
INFO: [main] Starting the concentrator
lgw_connect:532: INFO: no FPGA detected or version not supported (v103)
Note: success connecting the concentrator
ERROR: SPI ERROR DURING REGISTER BURST READ
INFO: tx_start_delay=1497 (1497.000000) - (1497, bw_delay=0.000000, notch_delay=0.000000)
INFO: End of upstream thread
ERROR: SPI ERROR DURING REGISTER WRITE
Rebooting the Conduit with init 6 solved the problem.
During this hang, no traffic was forwarded upstream.
Both observed failures were correlated with downlink (join) traffic. (This application otherwise does not use downlink traffic, and there are no other known applications on the gateway.)
Gateway version info:
root@mtcdt:~# uname -a
Linux mtcdt 3.12.27 #1 Thu May 5 18:53:23 CDT 2016 armv5tejl GNU/LinuxThe desperation workaround is to have a daemon watching the timestamp on the log, and if it becomes older than one or two minutes, to reboot the Conduit.