The processor throws an exception to the operating system if it tries to access an invalid or protected memory location. The B&R operating system logs this type of serious memory violation as a page fault (error 25314).
Processor memory can become invalid as a result of programming errors such as:
- Null or incorrect pointer
- Division by zero
- Accessing an index of an array which does not exist (i.e. the 11th element of a 10 element array)
- Incorrectly copying memory from one location to another. For example, if you copy (X) bytes of data to another location where only (X – 50) bytes are free, then 50 bytes of necessary information is overwritten. The next time the overwritten memory is accessed, the data is invalid and therefore the processor cannot successfully execute the command.
The following steps can be used as a guide to troubleshoot the cause of a page fault. It is recommended to execute these steps in the provided order.
If the AS program was previously working without triggering a page fault, then you should go and check the most recent change you made. If you most recently changed anything to do with pointers / arrays / memory copies / string copies / etc, then this is likely the cause of your page fault. Check to make sure you are not accessing an invalid element of an array. If you are doing any memory manipulation, make sure the size of the memory you are
manipulating is correct.
If you are using subversion, you may want to consider checking out a version of the program from the last point in time you know that the page fault was not occurring and continue development from there.
If you are unsure what you last changed or if reverting the last known changes did not solve the page fault, then move on to section 2.2.
Open the logger in Automation Studio by going to Open → Logger. Make sure the System logger module is visible because error 25314 gets entered into the System log:
Sort by time. Scroll down in the log until you see error 25314. If you do not see error 25314, then refresh the logbook via the following icon:
Select the 25314 entry. Then select the Backtrace tab below:
In some cases, there will be lines in the backtrace with a green arrow next to them. If you double click on these lines, then AS may jump directly to the line of code that caused the page fault. For example, after double clicking on either of the two lines shown in the backtrace below, AS jumps straight to line 9 of the task called Test:
In this case, it is clear that the page fault was caused by an array index out of bounds. The ‘z’ array only has 10 elements, but the 32766th element is written to by mistake.
If the information in the backtrace does not lead to you to a specific line in the program, then move on to section 2.3.
From this point forward it will be extremely helpful if you can determine a specific trigger which causes the page fault in your application. Some examples of these triggers are:
- Every time you navigate to page X in the visualization and click button Y, the page fault happens.
- Every time you save a recipe, the page fault happens.
- Ten minutes after running the machine in mode X, the page fault happens.
If you do not know the trigger for your page fault then that’s okay, but the troubleshooting process will take longer because you will have to wait an arbitrary amount of time until the page fault occurs again. In either case, continue to section 2.4.
Often times the backtrace information does not directly lead you to the cause of the page fault. If you are unable to manually identify the programming error via sections 2.1 and 2.2, then there are two libraries that you can use to help track down the cause of the page fault: IecCheck and AdvIecChk.
The IecCheck library is provided with Automation Studio. This library checks for division by zero, null pointers, invalid array indexes, invalid enumeration range accesses, and invalid subranges. If you add the IecCheck library to the project, the library will enter a new error into the logbook (55555) which contains more specific information about the problem it found (such as CheckDivInt for an integer division by 0). The page fault itself will not be triggered because the IecCheck library catches the problem first and sends the PLC into service mode. As a result, you will no longer have access to the backtrace information of the page fault.
Note that if the cause of the page fault is not one of the situations that the IecCheck library checks for, then the page fault will still be entered in the logbook and the IecCheck library will not give you any additional information.
- Add the IecCheck library from the toolbox to the Libraries package of your Logical View:
- Make sure that the IecCheck library exists in the “Library Objects” section of the Software Configuration (Physical View, right click on the CPU, select “Software”, scroll down to the “Library Objects” section).
Rebuild the project and transfer to the PLC.
Trigger the page fault.
Once the PLC is in Service mode, check the logger (Open → Logger). Sort by time. Look for a 55555 entry. If this entry exists, then that means the IecCheck library found a problem and sent the PLC into service mode before the page fault was triggered.
The 55555 entry will tell you the type of error it caught, the name of the task in which it caught the error, and the corresponding task class. For example, in the screenshot below the IecCheck library caught an array out of bounds issue in the task Test from task class 4:
- Fix the programming error which was identified in step 5 and then re-test the code to make sure the page fault is solved.
The AdvIecChk library is a modified version of the IecCheck library. The AdvIecChk library performs the same functions as the IecCheck library, but in addition it provides the following details about the location of the page fault:
- Last executed task class cyclic
- Last executed task name
- Type of programming error
- Variable values from the last executed line of code
- Backtrace pointing to the last executed line of code
The AdvIecChk library is not provided with Automation Studio, but it is included in the zip file along with this document.
- Add an “Existing Library” from the toolbox into the Libraries package of the Logical View.
- Navigate to the location of the provided AdvIecChk folder. Click Finish.
- Make sure the AdvIecChk library is present in the Library Objects section of the Software Configuration:
Rebuild the project and transfer to the PLC.
Trigger the page fault.
Once the PLC is in Service mode, check the logger (Open → Logger). Sort by time. Unlike with the standard IecCheck library, with the AdvIecChk library the page fault entry (25314) will be present in the logger. If the AdvIecChk library found a problem, then immediately prior to the page fault you will see a 55555 entry (but in this case it will just be a warning).
The 55555 entry will provide the following information:
- Task class cyclic of the task with the detected issue
- Name of the task with the detected issue
- Type of programming error
- Max, min valid value of the variable that caused this fault
- Value of the variable that ended up causing this fault
For example, in the screenshot below the 55555 entry indicates that an invalid index (32766) was attempted to be accessed from an array of [0…9] in task Test within task class 4:
(If the AdvIecChk library did not detect a problem, then the page fault will be by itself in the logger with no accompanying warning 55555.)
- Select the 25314 error and go to the Backtrace. Double click on the “FUNCTION START POSITION” line with a green arrow with the Module name that matches the task name that was identified in the 55555 warning immediately prior to the page fault. For example:
When you double click this line, Automation Studio will take you to the exact line of code where the page fault was encountered.
- Fix the programming error which was identified in step 7 and then re-test the code to make sure the page fault is solved.
Note that these libraries only help to troubleshoot page faults cause by programs written in IEC languages (Structured Text, Instruction List, Function Block Diagram, Ladder Diagram, Sequential Function Chart) plus Automation Basic. These libraries will not find page faults cause by programs written in C or C++.
These two libraries are capable of catching many types of IEC programming errors, but not every kind of error is detected. For example, consider the case that a pointer is pointing to an incorrect (but not invalid/protected) location in the memory. A memory copy operation using this pointer might end up overwriting some part of memory. When the processor tries to access this corrupted memory location later on, it will cause a page fault because the data in memory no longer makes sense to the processor. In this kind of situation, the memory has already been corrupted, so there is no way to tell what line of code caused the corruption.
You should not leave either of these libraries running on a production machine. They should only be added and utilized during active troubleshooting of a page fault. Once the page fault is solved, you should delete the library from the Logical View, rebuild and transfer to the PLC.
If your page fault did not trigger an entry in the logger from these libraries, then move on to section 2.5 to keep troubleshooting.
If you were not able to identify a trigger for your page fault in section 2.3, then skip to section 2.6.
Assuming the IecCheck / AdvIecChk libraries did not catch the cause of the page fault, then next you can systematically disable tasks in order to narrow down which task is causing the problem.
Go to the Physical View.
Right click on the CPU and select “Software”. This opens up the Software Configuration.
Right click on the task you want to disable and select “Disable”. Afterwards, the taskname will appear grayed out. To disable more than one task at a time, you can use Ctrl + Click to individually select each task, or you can use Shift + Click to select a consecutive chunk of tasks. Then right click and select “Disable”. Note that you cannot use the Shift + Click method across tasks in different cyclic task classes.
In general, here are the steps to approach this method:
Disable half of the tasks in the software configuration.
Transfer to the PLC.
Try to trigger the page fault using the previously identified trigger.
a. If the page fault still happens, then you know the problem is contained in the half of your tasks which is still enabled. Disable half of the remaining tasks and repeat the process. Continue this iterative process until the page fault no longer occurs. At that point, you know the page fault is being caused by the tasks you most recently disabled.
b. If the page fault does not occur, then you know the problem is contained in the half of your tasks which is currently disabled. Re-enable half of these tasks and repeat the process. Continue this iterative process until the page fault occurs again. At that point, you know the page fault is being caused by the tasks you most recently re-enabled.
Once you identify the task which is causing the page fault, you will have to manually read through this task to identify the problem. You can also implement the method described in section 2.6 specifically on this task in order to help find the issue.
This method can be used if a non-IEC language is causing the page fault (and therefore the IecCheck / AdvIecChk libraries do not apply) or if you have identified the problematic task in section 2.5 but cannot determine the specific problem within that task.
- Declare a remanent variable by checking the “Retain” checkbox in the .var file. For example:
Whether you make this a global or local variable depends on whether you have narrowed down which task is causing the page fault.
For more information on remanent variables, refer to the AS Help: Programming → Variables and data type → Variables → Nonvolatile variables → Remanent variables
- Increase the CPU memory configuration to accommodate this new remanent variable (if necessary). To check and see if this is necessary, build the project. If you get a build error related to remanent memory, then:
a. Go to the Physical View, right click on the CPU, and select “Configuration”.
b. Expand the “Memory configuration” section and all sub-sections within this section.
c. Make sure that a device has been selected for “Device for memory RemMem” and that the “RemMem memory size” is nonzero.
d. Make sure all of the configured memory sizes are greater than or equal to the used memory sizes.
Refer to the screenshot below.
For more information on memory configuration, refer to the AS Help: Programming → Editors → Configuration editors → Hardware configuration → CPU configuration → SG4 → CPU properties – Memory configuration
- Throughout your program, manually set the remanent test variable to incrementing values. For example:
If you have not identified which task is causing the page fault, then at the very least you are going to want to set this variable to a unique value at the top of each cyclic program.
- The next time the page fault occurs, check the value of this remanent variable via the Watch window. The value of this variable will help you to identify which chunk of code caused the page fault. For example, referring to the screenshot above, say the value of PageFaultTest was 2 once the PLC rebooted into service mode after the page fault. That means that the line of code which caused the page fault was somewhere within lines 30 and 33.
If you are unfamiliar with the Watch window, refer to the AS Help: Diagnostics and service → Diagnostics tools → Watch (variable monitor)