Skip to content

Multi-Tech Conduit hangs occasionally (downlink traffic)? #1

@terrillmoore

Description

@terrillmoore

This issue documents a problem report that first was reported on www.thethingsnetwork.org/forums: Latest MultiTech Packet Forwarder Stops Sending Packets.

The brief form: after updating to V3.0.0-r5 from V2.x, we observe hangs of the packet router code. Easiest way to detect this is to look at the timestamp on /var/log/lora-pkt-fwd.log:

root@mtcdt:~# ls -lrt /var/log/lora-pkt-fwd.log
-rw-r--r--    1 root     root        384148 Jun 19 12:31 /var/log/lora-pkt-fwd.log
root@mtcdt:~# date
Mon Jun 19 12:53:10 CST 2017

(Note the difference in time.)

The forum thread may describe additional issues; this thread only documents those observed by me. I have seen symptoms like this on other gateways, but only finally caught things in the act yesterday.

Status of the packet forwarder (from ps au) was:

  516 ?        Sl    34:01 /opt/lora/mp_pkt_fwd -c /var/config/lora -l /var/log/

I believe that the Sl flags are significant.

Restarting the packet forwarder does not clear the problem.

root@mtcdt:~# /etc/init.d/ttn-pkt-forwarder restart
Stopping ttn-packet-forwarder: OK
Found MTAC-LORA-915 with MTAC-LORA-1.0 hardware
Starting ttn-packet-forwarder: OK
root@mtcdt:~#

The relevant portion of /var/log/lora-packet-fwd.log (after the above restart) was:

INFO: [main] Starting the concentrator
lgw_connect:532: INFO: no FPGA detected or version not supported (v103)
Note: success connecting the concentrator
ERROR: SPI ERROR DURING REGISTER BURST READ
INFO: tx_start_delay=1497 (1497.000000) - (1497, bw_delay=0.000000, notch_delay=0.000000)

INFO: End of upstream thread
ERROR: SPI ERROR DURING REGISTER WRITE

Rebooting the Conduit with init 6 solved the problem.

During this hang, no traffic was forwarded upstream.

Both observed failures were correlated with downlink (join) traffic. (This application otherwise does not use downlink traffic, and there are no other known applications on the gateway.)

Gateway version info:

root@mtcdt:~# uname -a
Linux mtcdt 3.12.27 #1 Thu May 5 18:53:23 CDT 2016 armv5tejl GNU/Linux

The desperation workaround is to have a daemon watching the timestamp on the log, and if it becomes older than one or two minutes, to reboot the Conduit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions