Brief description of the setup: We have a mobile platform, where the OEM ECU and different sensors are connected to a X90 PLC. These sensor data and setpoints are exchanged via Powerlink to an APC910. There, a hypervisor system with an AR and a debian GPOS is installed, exchanging data via exOS.
We regularily but not always see two different exceptions, triggering a SERV mode of the APC runtime.
- 27309: AR-SIOS: Failed system tick event
- 9060: TC#5 Cycle time violation
I already found some useful tips in this thread
PLC in SERV CPU Mode - Ask Questions / Controls & Vision - B&R Community
but before I start to change the setup: What could be the reason that these exceptions sometime aren’t triggered for a whole week, and sometimes it happens every hour? I was not able to reliably trigger this event for troubleshooting. There is no logic in the PLC code of the APC, it is just a transparent data link so to say.
-
Idle task on APC has 59% CPU usage.
-
The APC uses system timer as clock source, while the X90 uses PLK as clock source. Can it be that the PLK communication interferes somehow?
-
Is it possible that anything on the debian system can cause these “outliers”? Network, CPU, memory, disk usage?
-
Stack trace of the cycle time violation indicates an issue inside the exos library.
Sadly, I heard that exos is discontinued and not even listed on br-automation.com anymore, so I don’t expect any support here.
If we would fully refactor the system, is there a best practice configuration to exchange data to a linux system?
See some screenshots of the exceptions below:
Hi,
I haven’t any experience with ExOS or deeper knowledge of Hypervisor system.
But at least, the data I see looks a bit similar to an issue I had many years ago (using ArWin, not Hypervisor).
That’s the reason why I want to share my thoughts:
the data im the backtrace points in the direction of increased ethernet communication.
In a general PCI bus architecture (multi master bus), the bus master that is active right now cannot be forced to stop data transfer and release the bus.
That means, if a bus master has control, other bus masters can’t tranfer their data until the active one finishes the transfer. The AR interface cards, but also other PCI devices like ethernet interfaces and so on, are such masters. And if perhaps the ethernet interface blocks the bus too long, the AR interface card cannot transfer the IO data within the configured cycle, which leads then to a IO scheduler cycle time violation.
And that happened in my case in past: because the ethernet interface has had under some circumstances a high load peak (in my case, it was because of a remote desktop access to the system via VNC), in rare cases the cyclic IO data transfer cycle was too late.
As I said, it happened with a different system, but also with shared hardware components between two operating system on one hardware, and it looked a bit similar.
So maybe it’s at least worth to take that information into your considerations.
(in past, our solution was lowering the ethernet port speed down to 100MBit instead of 1 GBit by driver settings, but I can’t imaging that’s exactly the solution for your case, too).
About communication between AR and GPOS:
the’re two possibilities, depending on the needed data transfer speed and amount:
- using the virtual ethernet interface between AR and GPOS for IP based communication (OPC UA, or TCP / UDP)
- Using a shared memory between AR and GPOS (in AR, the’re two libraries ArIscShm and ArIscEvent for that, for GPOS there’s a C API available, for details please check Automation Help, for example here: B&R Online Help)
Best regards!