I have the following setup:
4PPC70 PLC+HMI combo
Via ethernet to Mikrotik OMNI antenna.
To three WiFi antennas on company land at small pumping stations.
In these pumping stations I use a X20BC0087 + CM8281 for basic data collection via Modbus TCP.
I hoped that ModbusTCP will be quite resilient when it comes to losing signal - as soon as signal is back I would have hoped for restoring of communication.
I can ping all antennae and all X20BC0087 modules.
All was working well until I have tested turning off the omni antenna for a few minutes.
I would have thought that master picks up the connection automatically. Did you check the logger for any information? What Automation Studio Version and Runtime Version are you using?
After the “plug and play” timer expires the master should try to reestablish communication - normally there is no need to restart Modbus (which is even not possible) or the PLC
Logger shows last error regarding station 4 (the watchdog - timeout one) from 2 days ago. Since then we have occasional errors on station 5 and 6 when they are blocked by car or people, but nothing obvious that would really jump out on me.
Since the station is offline in AS resetting the bus controller will not work. When the station comes back after power cycle, it seems that the problem is on the bus controller side. Can you please download this tool and check if the bus controller would still connect to it when the PLC says its offline.
Surprisingly, I lost one of the three stations again, even after restart. This never happened before the OMNI antenna reset, which is something I will need to investigate.
ModbusTCP Toolbox shows this on a working device:
And this on dead device. So it seems that the BC0087 is locked waiting for ACK packet which got lost on the way, times out its watchdog and then never restarts.
I have enabled advanced slave monitoring on the working devices, seems that occasionally they lose comms. The RefreshTimeoutCnt tends to count up couple hundred points each time a timeout happens.
So the issue does not seem to be on the PLC side. Are you sure that the connection is not permanently interrupted? Do you restart the antenna or just the bus controller?
If this issue happens again, please try to ping the bus controller and see if it responds. If it does not, then it’s most likely the connection that is interrupted.
I would also recommend increasing the polling interval from 100ms to 200ms or at least 3 times the worst case ping response when you run 100 pings.
Your firmware on the bus controller is not the very latest, but I doubt that this is the issue.
At this point I have restarted everything in the pipeline - PLC, main antenna, three small antennae, three BC0087 controllers. Today morning all of them were dead again.
I do not blame BC0087, since they never had this issue before the omni antenna restart. I am not sure where to go from now, I went to restart the easiest to access module and then connected via ModbusTCP Toolbox to change watchdog timeout to 60000ms.
Strangely, when I pull ethernet from a working module, it goes to double red blink (watchdog error), but after reconnection it resets and works again.
I think we have to take this step by step. We have to figure out if it’s a problem of the antenna or the bus controller. Increasing the watchdog may make the issue less often, but it may not disappear. So when you loose connection, can you ping the bus controller? Does the connection come back when you just restart the antenna?
Strangely, when I pull ethernet from a working module, it goes to double red blink (watchdog error), but after reconnection it resets and works again.
Why is that strange? Its what I would expect to happen.
Exactly, I would expect the same and it works correctly, which makes the debugging harder.
Today morning the module with increased watchdog time was dead again.
I can still ping it, but does not respond to any Modbus commands.
Currently all three modules are dead again.
I am still suspecting the antenna, since it was the only thing that was changed recently - my colleague turned it off, cut the PoE cable at a wrong place, so crimped two RJ45s and used female-female coupling and then turned it on.
With view of this, I turned off the antenna, cut and recrimped the two new connections just in case.
Then I went to restart all three modules, since they did not come back up by themselves.
And here I am sitting at the ping monitor and observing how the omni antenna occasionally can’t be pinged. There is only wired connection sitting between it and my laptop.
When the bus controller fails the next time, can you try to connect a cable directly to it and check if you can connect then. I would also recommend that you change the polling cycle from 100ms to 200ms.
There is an alternative modbus solution based on an open source project. I am not sure if it makes a difference, but you could give it a try.
Thanks Stephan for all your efforts!
Situation has changed - I shall do that IF it fails next time, not WHEN.
Why? Because being puzzled by the only change being done to antenna and making it actively worse after recrimping, I went for just-in-case firmware upgrade on main antenna and three small ones.
Not a single issue since then, lost three pings out of 8000 and did not care about it at all.
So far all this points to wireless AP issue, nothing to do with B&R. Still makes me wonder why could I ping the blasted things when they did fail, which is something I will definitely test by direct connection if this ever happens again.