NI Linux Real-Time Discussions

cancel
Showing results for 
Search instead for 
Did you mean: 

Linux RT Restarting on its own - Need Assistance !!!

Hello,

 

Hopefully someone can provide a solution as to why one of our CompactRIO's (M/N 9045) decides to restart itself at random times. This particular CompactRIO controls expensive hardware and so this issue needs to be solved as soon as possible. This issue was first noticed back in the summer of 2018. It was occurring once every week or two back then.

 

While the test facility was offline in 2019, I had contacted NI for assistance and they said to try using the System Event Logs instead, but then told me that where I was saving the logs previously ( /home/lvuser/logs ) might be the problem, so I switched the logging location over to here ( /home/lvuser/natinst/LabVIEW Data/logs ) and thought that I had fixed the problem.  They said if all else fails, try reformatting the hard drive of the target.

 

Along came 2020 and the problem came back (why?). I am only talking about once or twice in June to August … but once was enough to be a major problem - if you know what I mean. 😞

 

So I reformatted the harddrive on 8/20/20 after this re-start error occurred on 8/19/20. I had thought all was well.  I did not change any of the FPGA or real-time code including the logging to file.  The same error again occurred on 9/15, 9/17 and 9/21. So much for best laid plans!  

 

I have Wireshark installed and thought to use that for debugging. Before I do that I thought to post this issue on the NI Linux R/T Community blog for ideas on where to look.

 

Each error is the same:

Date  Time  1633  41  LabVIEW Real-Time process restarted

 

Here are the last 4 error reports that are found here (/var/local/natinst/log/errlog.txt)

 

####
#Date: Wed, Aug 19, 2020 02:33:57 PM
#Desc: LabVIEW caught fatal signal
17.0 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0xffffffda01429913
#RCS: unspecified
#OSName: Linux
#OSVers: 4.9.47-rt37-6.1.0f0
#OSBuild: 264495
#AppName: lvrt
#Version: 17.0
#AppKind: AppLib
#AppModDate:


####
#Date: Tue, Sep 15, 2020 03:23:29 PM
#Desc: LabVIEW caught fatal signal
17.0 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0xffffffda00accf53
#RCS: unspecified
#OSName: Linux
#OSVers: 4.9.47-rt37-6.1.0f0
#OSBuild: 264495
#AppName: lvrt
#Version: 17.0
#AppKind: AppLib
#AppModDate:


####
#Date: Thu, Sep 17, 2020 12:41:55 PM
#Desc: LabVIEW caught fatal signal
17.0 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0xffffffda019ddf13
#RCS: unspecified
#OSName: Linux
#OSVers: 4.9.47-rt37-6.1.0f0
#OSBuild: 264495
#AppName: lvrt
#Version: 17.0
#AppKind: AppLib
#AppModDate:


####
#Date: Mon, Sep 21, 2020 01:52:33 PM
#Desc: LabVIEW caught fatal signal
17.0 - Received SIGSEGV
Reason: address not mapped to object
Attempt to reference address: 0x0xffffffda02472013
#RCS: unspecified
#OSName: Linux
#OSVers: 4.9.47-rt37-6.1.0f0
#OSBuild: 264495
#AppName: lvrt
#Version: 17.0
#AppKind: AppLib
#AppModDate:

 

Some additional notes (please see #4 as I am thinking this could be the issue):

 

1.) Windows 10 GUI (operator interface) - Firewalls all turned off.

 

2.) Ethernet connection to the CompactRIO via a switch which is shared with 3 or 4 other PC's.

 

3.) Using LabVIEW 2017 SP1 (32-bit).  Note: I have access to installing newer versions of LabVIEW R/T & FPGA.   If any of you know of a more stable version to use, then please let me know what version is better than 2017 SP1. So far, this version did seem to be a stable one....

 

4.) The target CPU % is monitored from the W10 GUI PC and the % ranges from 2.5 to a maximum of about 4.5% so this does not appear to be the issue. 

 

NOTE: I am using System Configuration VI's to monitor the CPU %, the Total Disk space and the Free Disk space once a second. Since I am only have one user setup for Web Configuration (admin), I am logging in using "admin" and a short password.  Question: should I setup a new user account with certain access credentials and use that? Could the use of "admin" here be causing problems?

 

5.) FPGA compilation ("Optimize Performance" chosen as "Default" had compile errors).  Only using ~ 45% of the slices. I can add other compilation information as needed.

 

6.) FPGA C-Series Modules (8 - Slot Sequence): 9426, 9476, 9411, 9411, 9411, 9269, 9263, 9263.

 

7.) I am not updating the front panel of the R/T Main VI, but I am updating the front panel of the FPGA Main VI as the R/T code reads and writes to the front panel of the FPGA. The front panel of the FPGA has a few clusters and boolean controls and indicators.

 

8.) I do not have any VI's setup as Web Services. 

 

When I was previously using the Pharlap 9024 cRIO with LabVIEW 2015 SP1, I did not have any of these restart problems. Note: Going backwards is not an option. We try to keep our LabVIEW versions no more than 3 years old and the hardware no more than 5 years old - as a rule of thumb (with $$$ constraints in view of course).

 

The full detailed error log that one can retrieve from MAX or via the project and right-clicking on the target, includes a few "VI BROKEN" notes related to the following LabVIEW libraries:

 

NI_LVConfig.lvlib

NI_FileType.lvlib

NI_PID_pid.lvlib

NI_Matrix.lvlib

 

I don't know if these libraries are a symptom or an upstream cause of the issue.

 

I can send the full error log next response.

 

I DO appreciate anyone's help who has extensive experience with the Linux R/T O/S. 

 

Cheers,

 

Karl

0 Kudos
Message 1 of 2
(1,668 Views)

Oof. It's unfortunate this has dragged on for so long. 

 

In general, unless you have concrete evidence to the contrary, the most stable version of LabVIEW is probably the latest version. Moreover, you're going to be much better supported over the next few years using 2020 vs 2017SP1. So I would strongly recommend upgrading to LabVIEW 2020, or at least preparing to.

 

Everything you're saying you're doing sounds more or less fine to me. Seeing those lvlibs be reported as broken might be expected, under certain circumstances. Hard to tell from here.

 

I'd probably suggest uploading /var/local/natinst/log, but not to this forum — logs relating to your source code (particularly names) are probably going to get disclosed. Instead, I'd recommend just pulling on NI support's chain a little harder.

0 Kudos
Message 2 of 2
(1,637 Views)