Modbus TCP equipement communication failure

Hi all,

I have a project in production for a year, and it’s been working flawlesly until I did a minor unrelated sw update.
Everything works as before except one communication with a modbus TCP equipement.
No configuration, nor topology nor runtime changed. I only had to recompile because of a hardcoded value !
I can contact the equipment from any other devices on any other network, but not from the X90 controller.
What is surprising, is that the X90 is still able to communicate with other modbus equipements except this very one.
I am completele clueless, as to what to try when the ModuleOk stays false, and no diag error can be read.

Any clue is welcome.

Config is on X90CP172.48-00 Runtime G4.93 IP 10.11.12.10 (Modbus TCP Master)
No comm equipment (Modbus TCP Slave: 10.11.12.2 unit ID 10)
Everything is done from AS 4.12.5.95 with modbus driver version 1.1.1.0

Thank you for your help
Alain

Hello Alain,

to find this isue it is helpful to have a wireshark trace of the ethernet traffic.

Regards
Stephan

Hello Stephan,

I managed to get a tcp dump of the traffic, and indeed we see that the faulty equipement (10.11.12.2) keeps trying to reach the X90 controller (10.11.12.10).
However, remotely from the modem, we successfully access the equipement.
So we put the X90 controller and equipement on a new switch, and it didn’t change anything: Remotly everything is fine, locally, it fails.
I suspect the cabling, but it will take time to reroute the boat.
What else can I check ?
flasg_error_capture.zip (11.0 KB)

Thanks for your helps

Hello Alain,

I don’t see any request from IP 10.11.12.10 (master) to IP10.11.12.2 (slave) in the wireshark trace.

Can you please send a system dump with data+parameter?

Hello Stephan,

You are right, there is no request from master 10.11.12.10 to slave 10.11.12.2.
I did some recording while rebooting AR and I failed to capture at least a connexion request.
But it I did get a full session with the other slave (10.11.12.1).
So I tried from modbus doctor through the VPN, and again, the session is correct and and I get valid capture full of nive modbus exchange.

I get fieldbus errors 33058, 33067, 33072, 33074 from AR, which means it is at least trying to connect.
It still unclear to me if it is a configuration problem or a hardware problem.

Is the modbus unit identifier important in modbus TCP ?
Should I set it to any other value ?

I keep trying to get a tcp dump trace with connection attemps

Thanks for your help

Hallo Alain,

The unit identifier is important. The head station of the Modbus TCP slave usually has identifier 1 and if other stations are connected to the same slave, they can be addressed with ascending identifiers.

A system dump would definitely help.

Hi Alain,

I had a look into the wireshark trace, and it looks a bit strange what I can see there.
Maybe it’s only a “follow-up issue” to your testing to reach the 2.nd device via VPN, at least it looks a bit like that for me, or I haven’t understood the whole architecture right now.
So, are the ModbusTCP master and slave are connected directly (via switch), or are they connected via a routing device / VPN?

I’m asking because I’ve seen that the TCP/IP device 10.11.12.2 (which is a ModbusTCP slave as I understood) is trying to communicate to an IP address 10.11.12.10 (which should be the ModbusTCP master and therefore the PLC, right?), but is using a MAC address of a Teltonika device (if wireshark decodes the MAC address right, I know Teltonika for their RUT systems and I assume this device is a VPN router?) instead of the B&R’s PLC MAC address??

A B&R PLC MAC address should always start with 00:60:65 …, but here I can see that the destination MAC address is different 20:97:27…

Looking to the communication of the device 10.11.12.1 (I assume this is the first ModbusTCP device, station number 7), there’s the right MAC address used for 10.11.12.10, starting with 00:60:65…


But also interesting: the slave device of this communication seems to be the same Teltonika device, is this right? So, is the device with IP address 10.11.12.1 really a router and configured as ModbusTCP slave and connected to the PLC as ModbusTCP master?

… sorry, even more questions then answers.
But at least in that wireshark trace it looks a bit confusing what’s going on on the network :wink: I think, that at least the ModbusTCP slave devices still using the MAC addresses of the router in their internal MAC table because of reaching them out via VPN (at least in my opinion this would explain why different devices have different MAC addresses aligned to the masters IP address).

To come into a clear situation, at least all ethernet devices (including routers and switches!) should do a MAC table flush at the same time - normally this could be done by switching them all off (at the same time!) for some time (normally, a “joined switched off time” around 30-60 seconds should be enough).
After that, all devices should update their MAC address tables when starting IP communication again - of course I don’t think that this solves your issue, but hopefully at least resets the network to a “known situation” so that we can try to go on investigating…

Could you also please post a picture of your complete network architecture, so how the devices are connected, what device has what function (master, slave, router, PLC, manufacturer, a.s.o) and should have what IP address? Maybe that helps to interprete the network traces.

Best regards!

1 Like

Hi All,

I owe you an update on that one.
The roor cause has stille not been confirmed but :

  • Electrical rewriring seems to have made the ethernet traffic much more noisy, expcially on the teltonika router. So we put a network switch in place.
  • The modbus slave never showed up again, but the ethernet traffic was much better.
  • A bad network mask on the modbus slave caused it to issue a ARP request. 255.255.255.255 is not a valid network mask as expected, but the system did work before because we had a dhcp server that would answer the call
  • Once we fixed the mask, the system works with the switch, the traffic is a noisy but it works.
  • We updated the X90 runtime to K4.93 during the test and kept it afterwards.
  • Runtime M4.93 has fixe that affect CPU performance : (RKFY-7507: X90CP17x: tX90pnp takes 3.5% CPU load)
  • It turns out the CPU is too busy, and this regression caused the network task to be deferred, causing a TCP re send message that add noise to a noisy traffic, causing failure
  • We could remotly connect with modbus to the equipement when the X90 CPU stops, which tend to confirm it is part of the problem.

I will be able to test the new runtime in the coming days, and this will definitely confirm this hypothesis : a mix of interference, runtime regression, bad network phy layer

Thanks for you precious inputs.
Alain

Hi Alain, i marked your summary as a solution. But feel free to update us anytime when you find something new.