Linux Users

cancel
Showing results for 
Search instead for 
Did you mean: 

Ubuntu ni-serial crash on boot with 2023 Q4 Linux drivers

Recently I experienced some issues using the most recent ni-serial drivers (2023 Q4) on Linux. This post outlines what my issue was and a workaround I found for anyone else who stumbles across this issue.

 

System Specs

 

Ran into this issue on both servers I tried installing the most recent drivers on:

 

First Machine:

Computer: Dell Inc. PowerEdge T440

Kernel: 5.15.0-88-lowlatency #98-Ubuntu SMP PREEMPT Mon Oct 9 14:52:46 UTC 2023

Release: Ubuntu 22.04.3 LTS

CPU: Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz

GPU: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller

RAM: 2x8Gb Samsung M393A1K43DB2-CWE (DDR4 3200 MT/s)

Motherboard: Del Inc. 0RMHXK

BIOS Version: 2.19.1

NI Serial Card: Communication controller: National Instruments PCIe-8431/16 (RS-485) Interface

 

Second Machine:

Computer: Dell Inc. Precision 5820 Tower

Kernel: 5.15.0-88-lowlatency #98-Ubuntu SMP PREEMPT Mon Oct 9 14:52:46 UTC 2023

Release: Ubuntu 22.04.3 LTS

CPU: Intel(R) Xeon(R) W-2255 CPU @ 3.70GHz

GPU: NVIDIA Corporation TU117GL [T400 4GB]

RAM: 2x16Gb Samsung M393A2K43EB3-CWE (DDR4 3200 MT/s)

Motherboard: Dell Inc. 06JWJY

BIOS Version: 2.30.0

NI Serial Card: Communication controller: National Instruments PCIe-8431/16 (RS-485) Interface

 

The Problem

When installing ni-serial with the 2023 Q4 Linux drivers and rebooting, I would get a Kernel panic. The error manifested itself in one of two ways with each boot:

 

Booting Regularly:

 

/dev/sda2: recovering journel
/dev/sda2: clean, 194907/29237248 files, 5793218/116947200 blocks
[    3.093966] ACPI Error: No handler for Region [SYSI] (00000000bdcf2adb) [IPMI
[    3.094244] ACPI Error: Region IPMI (ID=7) has no handler (20210730/exfldio-2
[    3.094570] ACPI Error: Aborting method \_SB.PMI0._GHL due to previous error
[    3.094880] ACPI Error: Aborting method \_SB.PMI0._PMC due to previous error 

 

 

Booting in recovery mode:

 

[5.944086] Code: Unable to access opcode bytes at RIP 0x7f108afc68f6.
[5.944101] niserialconfig [613]: segfault at 7f92d0fde920 ip 00007f92d0fde920 sp 00007ffcfd23f228 error 14 in libstdc++.so.6.0.30 [7f92d0f66000+111000]
[5.944109] Code: Unable to access opcode bytes at RIP 0x7f92d0fde8f6.
[5.944109] niserialconfig [615]: segfault at 7f6beb456920 ip 00007f6beb456920 sp 00007ffeab7168a8 error 14 in libstdc++.so.6.0.30 [7f6beb3de000+111000]
[5.944127] Code: Unable to access opcode bytes at RIP 0x7f6beb4568f6.
[5.944139] niserialconfig [628]: segfault at 7fed97645920 ip 00007fed97645920 sp 00007ffdbe431568 error 14
[5.944141] niserialconfig [621]: segfault at 7fbaf1b0c920 ip 00007fbaf1b0c920 sp 00007ffd00b6b418 error 14
[5.944144] in libstdc++.so.6.0.30 [7fed975cd000+111000]
[5.944147] in libstdc++.so.6.0.30 [7fbaf 1a94000+111000]
[5.944148] Code: Unable to access opcode bytes at RIP 0x7fed976458f6.
[5.944149]
[5.944148] niserialconfig [614]: segfault at 7fce58a46920 ip 00007fce58a46920 sp 00007ffd380da188 error 14
[5.944152] Code: Unable to access opcode bytes at RIP 0x7fbaf1b0c8f6.
[5.944156]  in libstdc++.so.6.0.30 [7fce589ce000+111000]
[5.944162] Code: Unable to access opcode bytes at RIP 0x7fce58a468f6.
[6.261835] CPU: 11 PID: 626 Comm: niserialconfig Tainted: P        W  OE     5.15.0-88-lowlatency #98-Ubuntu
[6.271975] Hardware name: Dell Inc. PowerEdge T440/0RMHXK, BIOS 2.19.1 06/12/2023
[6.288638] Call Trace:
[6.302458]  <TASK>
[6.313394]  show_stack+0x52/0x5c
[6.323293]  dump_stack_1v1+0x4a/0x63
[6.333588]  dump_stack+0x10/0x16
[6.344122]  panic+0x163/0x33b
[6.355375]  do_exit.cold+0x50/0xa0
[6.365654]  do_group_exit+0x3b/0xa0
[6.375181]  get signal+0xb5/0x990
[6.387012]  arch_do_signal_or_restart+0xde/0x100
[6.403752]  exit_to_user_mode_loop+0xc4/0x160
[6.414827]  exit_to_user_mode_prepare+0xa0/0xb0
[6.423986]  irgentry_exit_to_user_mode+0x9/0x20
[6.433169]  irgentry_exit+0x3b/0x50
[6.443035]  exc_page_fault+0x89/0x190
[6.452156]  asm_exc_page_fault+0x27/0x30
[6.461174] RIP: 0033:0x7fa77cc2e920
[6.470340] Code: Unable to access opcode bytes at RIP 0x7fa77cc2e8f6.
[6.478821] RSP: 002b:00007fff7dd74f28 EFLAGS: 00010202
[6.488261] RAX: 00007fa77cd3da68 RBX: 00007fa77cd45540 RCX: 0000000000000001
[6.501518] RDX: 00007fa77cd47030 RSI: 0000000000000000 RDI: 00007fa77cd44e20
[6.512646] RBP: 00007fff7dd74f30 R08: 00000000022d3d60 R09: 00000000022d8740
[6.521382] R10: 00007fa77cb407f0 R11: 00007fa77cc57df0 R12: 00007fa77cadc838
[6.529787] R13: 0000000000642b39 R14: 00007fa77caddee8 R15: 00007fa77caddf00
[6.537362]  </TASK>
[7.616408] Shutting down cpus with NMI
[7.623585] Kernel Offset: 0x16600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[7.656203] --- [ end Kernel panic- not syncing: Aiee, killing interrupt handler! ]---

 

 

The Solution (Ubuntu)

If you encounter this issue I discovered you can recover the machine by physically removing the serial card out of the PCIe slot and starting up the machine again. This time it will successfully boot. Once the computer is started, you will want to uninstall the 2023 Q4 drivers and downgrade to the older 2023 Q2 drivers. Below detail how to do this on Ubuntu.

 

Uninstall broken drivers

This is not a complete list of all NI packages, if you have others you have installed as well you may wish to consider downgrading them too. I did not check if you actually need to downgrade all drivers or just the ni-serial driver for this to work.

 

user@computer:~$ sudo apt remove ni-daqmx
user@computer:~$ sudo apt remove ni-hwcfg-utility
user@computer:~$ sudo apt remove ni-visa
user@computer:~$ sudo apt remove ni-serial
user@computer:~$ sudo apt remove ni-software-2023-jammy
user@computer:~$ sudo apt autoremove

 

 

Check ni packages have all been removed

You can check if there are any ni packages left with this command. Note: Not all packages that show up with this command are necessarily ni packages (Ie: gnome-initial-setup/jammy-updates,now 42.0.1-1ubuntu2.3 amd64 [installed,automatic] matches this grep but is not affiliated with national instruments)

 

user@computer:~$ apt list --installed | grep ni

 

 

Install older drivers

Now that we have removed the bad drivers you will want to get the 2023 Q2 drivers (These are the ones I have found to be the most stable)

 

user@computer:~$ wget https://download.ni.com/support/softlib/MasterRepository/LinuxDrivers2023Q2/NILinux2023Q2DeviceDrivers.zip
user@computer:~$ unzip NILinux2023Q2DeviceDrivers.zip
user@computer:~/NILinux2023Q2DeviceDrivers$ cd NILinux2023Q2DeviceDrivers/
user@computer:~/NILinux2023Q2DeviceDrivers$ sudo apt install ./ni-ubuntu2204-drivers-2023Q2.deb

 

 

Continue regular install process

Now that we have set apt to use the older source, we can follow from step 4 of the Linux driver install instructions as normal. Steps 4, 5 (for serial), 6, and 7 are copied below. 

 

user@computer:~$ sudo apt update
user@computer:~$ sudo apt install ni-serial
user@computer:~$ sudo apt install ni-hwcfg-utility
user@computer:~$ sudo dkms autoinstall

 

 

Restart computer

Finally, all that's left is to reboot and you're set!

 

user@computer:~$ sudo reboot now

 

 

Further Discussion

Has anyone encountered this issue before? What would be the proper channels for me to report this to NI so they can investigate/patch their Linux drivers going forward?

Message 1 of 3
(1,522 Views)

Hi ChrisSpace, thank you for reporting this behavior in such great detail! I've filed this bug internally with the team to investigate.

 

The most reliable way to report bugs is to sign in to your ni.com account and submit a service request of the type "Report a Bug." The process is documented in our article here: https://knowledge.ni.com/KnowledgeArticleDetails?id=kA03q0000019gX9CAI&l=en-US

 

Thanks!

 

Kayla A.

Software Product Management

Message 2 of 3
(1,406 Views)

Awesome, It would be great if you could post an update here once the bug is done being investigated!

0 Kudos
Message 3 of 3
(1,366 Views)