How to restart ModbusTCP device/driver after loss of comms?

I have the following setup:
4PPC70 PLC+HMI combo
Via ethernet to Mikrotik OMNI antenna.
To three WiFi antennas on company land at small pumping stations.

In these pumping stations I use a X20BC0087 + CM8281 for basic data collection via Modbus TCP.

I hoped that ModbusTCP will be quite resilient when it comes to losing signal - as soon as signal is back I would have hoped for restoring of communication.
I can ping all antennae and all X20BC0087 modules.

All was working well until I have tested turning off the omni antenna for a few minutes.

In AS, comms are completely dead:


Configuration as follows:

Do you see what am I missing? Thanks

I would have thought that master picks up the connection automatically. Did you check the logger for any information? What Automation Studio Version and Runtime Version are you using?

Stephan

After the “plug and play” timer expires the master should try to reestablish communication - normally there is no need to restart Modbus (which is even not possible) or the PLC

I have power cycled 2 out of 3 stations and those came back. Started to think that it could be the internal watchdog.

Third station was not power cycled on purpose and it blinks double red - watchdog timeout.

I am constantly writing 0xC1 (193) to 0x1044, which should clear a triggered watchdog, but it does not.

Logger shows last error regarding station 4 (the watchdog - timeout one) from 2 days ago. Since then we have occasional errors on station 5 and 6 when they are blocked by car or people, but nothing obvious that would really jump out on me.

Since the station is offline in AS resetting the bus controller will not work. When the station comes back after power cycle, it seems that the problem is on the bus controller side. Can you please download this tool and check if the bus controller would still connect to it when the PLC says its offline.

ModbusTCP Toolbox | B&R Industrial Automation

We also need to know what software versions you use for

  • Automation Studio
  • Runtime
  • Bus Controller SW
  • Bus Controller HW

Stephan

Surprisingly, I lost one of the three stations again, even after restart. This never happened before the OMNI antenna reset, which is something I will need to investigate.
ModbusTCP Toolbox shows this on a working device:

And this on dead device. So it seems that the BC0087 is locked waiting for ACK packet which got lost on the way, times out its watchdog and then never restarts.

I have enabled advanced slave monitoring on the working devices, seems that occasionally they lose comms. The RefreshTimeoutCnt tends to count up couple hundred points each time a timeout happens.

I am using runtime B4.92, AS is 4.11.2.75
BC0087 details as follows:

So the issue does not seem to be on the PLC side. Are you sure that the connection is not permanently interrupted? Do you restart the antenna or just the bus controller?

If this issue happens again, please try to ping the bus controller and see if it responds. If it does not, then it’s most likely the connection that is interrupted.

I would also recommend increasing the polling interval from 100ms to 200ms or at least 3 times the worst case ping response when you run 100 pings.

Your firmware on the bus controller is not the very latest, but I doubt that this is the issue.

Stephan

At this point I have restarted everything in the pipeline - PLC, main antenna, three small antennae, three BC0087 controllers. Today morning all of them were dead again.

I do not blame BC0087, since they never had this issue before the omni antenna restart. I am not sure where to go from now, I went to restart the easiest to access module and then connected via ModbusTCP Toolbox to change watchdog timeout to 60000ms.

Strangely, when I pull ethernet from a working module, it goes to double red blink (watchdog error), but after reconnection it resets and works again.

I think we have to take this step by step. We have to figure out if it’s a problem of the antenna or the bus controller. Increasing the watchdog may make the issue less often, but it may not disappear. So when you loose connection, can you ping the bus controller? Does the connection come back when you just restart the antenna?

Strangely, when I pull ethernet from a working module, it goes to double red blink (watchdog error), but after reconnection it resets and works again.

Why is that strange? Its what I would expect to happen.

Exactly, I would expect the same and it works correctly, which makes the debugging harder.
Today morning the module with increased watchdog time was dead again.
I can still ping it, but does not respond to any Modbus commands.

ok, now when you restart the antenna does it come back or do you have to restart the bus controller as well?

Can you post the results from the ping response?

Currently all three modules are dead again.
I am still suspecting the antenna, since it was the only thing that was changed recently - my colleague turned it off, cut the PoE cable at a wrong place, so crimped two RJ45s and used female-female coupling and then turned it on.

With view of this, I turned off the antenna, cut and recrimped the two new connections just in case.

Then I went to restart all three modules, since they did not come back up by themselves.

And here I am sitting at the ping monitor and observing how the omni antenna occasionally can’t be pinged. There is only wired connection sitting between it and my laptop.

At the point when it can’t be pinged, PLC loses comms to modules, but they do come back after connection is restored few seconds later.

It is yet unclear how things have to fail to cause this non-recoverable error where I can ping modules but not talk to them via port 502.

I will check logs of omni antenna, move the PoE power injector much closer to it and see where it will lead.

When the bus controller fails the next time, can you try to connect a cable directly to it and check if you can connect then. I would also recommend that you change the polling cycle from 100ms to 200ms.

There is an alternative modbus solution based on an open source project. I am not sure if it makes a difference, but you could give it a try.

archive-br-automation-com/modbusTCP-Automation-Studio: modbusTCP library for Automation Studio

Thanks Stephan for all your efforts!
Situation has changed - I shall do that IF it fails next time, not WHEN.
Why? Because being puzzled by the only change being done to antenna and making it actively worse after recrimping, I went for just-in-case firmware upgrade on main antenna and three small ones.

Not a single issue since then, lost three pings out of 8000 and did not care about it at all.

So far all this points to wireless AP issue, nothing to do with B&R. Still makes me wonder why could I ping the blasted things when they did fail, which is something I will definitely test by direct connection if this ever happens again.

1 Like