LabWindows/CVI

cancel
Showing results for 
Search instead for 
Did you mean: 

XIO: fatal IO error 11

Solved!
Go to solution

The 6.2 machines are off limits for a while and I can't have access to them for a few more (weeks? months?). They haven't been patched yet, so I can't be sure for now if the cause is the OS difference or the patch. I want to try on a 3rd machine, but haven't gotten around yet. Busy.

0 Kudos
Message 11 of 36
(10,700 Views)

OK, I managed to force a core dump by placing a call to abort() inside an atexit() callback and waiting a week. Apparently the problem is not in my code, but between CVI and X11:

(gdb) bt
#0  0x00bc8424 in __kernel_vsyscall ()
#1  0x0085a861 in raise () from /lib/libc.so.6
#2  0x0085c13a in abort () from /lib/libc.so.6
#3  0x0808f5cf in Unexpected () at MyCode.c:1378
#4  0x0085de9f in exit () from /lib/libc.so.6
#5  0x00c85701 in _XDefaultIOError () from /usr/lib/libX11.so.6
#6  0x00c85797 in _XIOError () from /usr/lib/libX11.so.6
#7  0x00c84055 in _XReply () from /usr/lib/libX11.so.6
#8  0x00c68b8f in XGetImage () from /usr/lib/libX11.so.6
#9  0x004fd6a7 in ?? () from /usr/local/lib/libcvi.so
#10 0x00478ad5 in ?? () from /usr/local/lib/libcvi.so
...
#29 0x001eed9d in ?? () from /usr/local/lib/libcvi.so
#30 0x001eee41 in RunUserInterface () from /usr/local/lib/libcvi.so
#31 0x0808fab4 in main (argc=2, argv=0xbfbdc984) at MyCode.c:1540

 

0 Kudos
Message 12 of 36
(10,668 Views)

I'm getting back to this critical problem with more info:

- it happens after days (or weeks) in intensive user interfaces (hundreds of updates every second)

- it happens with Scientific Linux 6.1 and 6.5 at least

- it happens in Mandriva 2010

- it happens with LabWindows for Linux 2010, 2013 and 2013p1 (version 13.0.0.29 and 13.0.0.30), although it happens more rarely in 2010

- I have it in several different programs compiled _from_ several different systems.

 

The culprit from the backtrace is always something internal to the CVI lib relating to Xwindows:

#21 0x00a706b1 in _XDefaultIOError () from /usr/lib/libX11.so.6
#22 0x00a70747 in _XIOError () from /usr/lib/libX11.so.6
#23 0x00a6f0a6 in _XReply () from /usr/lib/libX11.so.6
#24 0x00a53c0f in XGetImage () from /usr/lib/libX11.so.6

 

After much mostly fruitless research, my guess is that some X graphic property is used and not released but I'm not an X11 programmer...

 

Now this is a huge pita, because I may have uptime of a few years on my Linux control/command servers but it's useless if my processing programs crash after a few weeks.

 

I'd like this acknowledged as a serious bug by NI and I'm surprised that nobody else has been hit by this problem.

0 Kudos
Message 13 of 36
(10,546 Views)

gdarmaud, 

 

It's been over a month since you last posted. Have there been any updates since then? Could you add a logging feature into your application to see what's happening when the crash occurs?

0 Kudos
Message 14 of 36
(10,527 Views)

Hey gdargaud, 

 

How are you connecting to these machines? Is it SSH? Do you physical go to each station? Do you use some kind of remote client? Did you get a chance to strip down more of the code and try a simpler project? 

 

-KP

Kurt P
Automated Test Software R&D
0 Kudos
Message 15 of 36
(10,524 Views)

The only relevant log I managed to get is though an abort() in an atexit() function, as previously mentionned with the provided backtrace.

 

The applications run directly on the machines, no remote connection. As for stripping down the code, I have only 2 projects that fail like this (there may be more but since it takes a long time before a crash it's possible that we may not have noticed on other projects). The 2nd one is not very big and I could give it to you, but it's been running on my development machine for the last 3 months... without crashing. But the logs show 2 crashes in the last year on the production machine (s). So it's hard to reproduce.

0 Kudos
Message 16 of 36
(10,516 Views)

Gdargaud, 

 

Did you ever get a chance to make an application that was more graphically intense? Perhaps we can just overload the application and insite the behavior is a time that's less than a month. 

What exactly does your application do? I read through the post and did not see any explaination of this. 

How do all of these panels come into play? I am looking for an understand of the application architecture.

Could you post the entire log file? There might be some more clues in it.

 

Thanks,

 

KP

Kurt P
Automated Test Software R&D
0 Kudos
Message 17 of 36
(10,487 Views)

OK, here are two more recent core dumps. If I had the debug symbols from libcvi.so, I guess I'd get a lot more info. Is that at all possible ?

(gdb) bt
#0  __kernel_vsyscall () at arch/x86/vdso/vdso32/sysenter.S:49
#1  0x00962871 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x0096414a in abort () at abort.c:92
#3  0x0808f744 in Unexpected () at MyProg.c:1382
#4  0x00965eaf in __run_exit_handlers (status=1) at exit.c:78
#5  exit (status=1) at exit.c:100
#6  0x00c356b1 in _XDefaultIOError (dpy=0x88aeb80) at XlibInt.c:1292
#7  0x00c35747 in _XIOError (dpy=0x88aeb80) at XlibInt.c:1498
#8  0x00c340a6 in _XReply (dpy=0x88aeb80, rep=0xbf82fa90, extra=0, discard=0) at xcb_io.c:708
#9  0x00c18c0f in XGetImage (dpy=0x88aeb80, d=27263845, x=0, y=0, width=60, height=20, plane_mask=4294967295, format=2) at GetImage.c:75
#10 0x005f46a7 in ?? () from /usr/local/lib/libcvi.so
#11 0x0056fad5 in ?? () from /usr/local/lib/libcvi.so
#12 0x00570806 in ?? () from /usr/local/lib/libcvi.so
#13 0x0054a82f in ?? () from /usr/local/lib/libcvi.so
#14 0x00552702 in ?? () from /usr/local/lib/libcvi.so
#15 0x00608eb3 in ?? () from /usr/local/lib/libcvi.so
#16 0x00547474 in ?? () from /usr/local/lib/libcvi.so
#17 0x00543035 in ?? () from /usr/local/lib/libcvi.so
#18 0x00543885 in ?? () from /usr/local/lib/libcvi.so
#19 0x0054396f in ?? () from /usr/local/lib/libcvi.so
#20 0x005b9deb in ?? () from /usr/local/lib/libcvi.so
#21 0x005d3664 in ?? () from /usr/local/lib/libcvi.so
#22 0x005d3a9a in ?? () from /usr/local/lib/libcvi.so
#23 0x005d42ce in ?? () from /usr/local/lib/libcvi.so
#24 0x005d42b6 in ?? () from /usr/local/lib/libcvi.so
#25 0x005d42b6 in ?? () from /usr/local/lib/libcvi.so
#26 0x005d42b6 in ?? () from /usr/local/lib/libcvi.so
#27 0x005d42b6 in ?? () from /usr/local/lib/libcvi.so
#28 0x005d42b6 in ?? () from /usr/local/lib/libcvi.so
#29 0x005d42b6 in ?? () from /usr/local/lib/libcvi.so
#30 0x0065f170 in ?? () from /usr/local/lib/libcvi.so
#31 0x0065a620 in ?? () from /usr/local/lib/libcvi.so
#32 0x00470084 in ?? () from /usr/local/lib/libcvi.so
#33 0x002e51ca in ?? () from /usr/local/lib/libcvi.so
#34 0x002e59ba in ?? () from /usr/local/lib/libcvi.so
#35 0x002e5d9d in ?? () from /usr/local/lib/libcvi.so
#36 0x002e5e41 in RunUserInterface () from /usr/local/lib/libcvi.so
#37 0x0808fc29 in main (argc=2, argv=0xbf830c84) at MyProg.c:1544

 

(gdb) bt
#0  0x005816bb in ?? () from /usr/local/lib/libcvi.so
#1  0x004fcad5 in ?? () from /usr/local/lib/libcvi.so
#2  0x004fd806 in ?? () from /usr/local/lib/libcvi.so
#3  0x004d782f in ?? () from /usr/local/lib/libcvi.so
#4  0x005c7fed in ?? () from /usr/local/lib/libcvi.so
#5  0x005c88a8 in ?? () from /usr/local/lib/libcvi.so
#6  0x005c901e in ?? () from /usr/local/lib/libcvi.so
#7  0x005d7554 in ?? () from /usr/local/lib/libcvi.so
#8  0x005dbdc3 in ?? () from /usr/local/lib/libcvi.so
#9  0x005de97c in ?? () from /usr/local/lib/libcvi.so
#10 0x005dea37 in ?? () from /usr/local/lib/libcvi.so
#11 0x004d41b1 in ?? () from /usr/local/lib/libcvi.so
#12 0x004d32c4 in ?? () from /usr/local/lib/libcvi.so
#13 0x004d570e in ?? () from /usr/local/lib/libcvi.so
#14 0x003fa538 in ?? () from /usr/local/lib/libcvi.so
#15 0x00226fdb in ?? () from /usr/local/lib/libcvi.so
#16 0x0031e51f in ?? () from /usr/local/lib/libcvi.so
#17 0x0031e9bc in SetCtrlAttribute () from /usr/local/lib/libcvi.so
#18 0x080aa465 in MasterActionsLog (fmt=0x80fb3f8 "Client program UNEXPECTED exit")
    at MyProg.c:104
#19 0x0808f6ec in Unexpected () at MyProg.c:1377
#20 0x00c08eaf in __run_exit_handlers (status=1) at exit.c:78
#21 exit (status=1) at exit.c:100
#22 0x00a706b1 in _XDefaultIOError (dpy=0x88a4b80) at XlibInt.c:1292
#23 0x00a70747 in _XIOError (dpy=0x88a4b80) at XlibInt.c:1498
#24 0x00a6f0a6 in _XReply (dpy=0x88a4b80, rep=0xbff67050, extra=0, discard=0) at xcb_io.c:708
#25 0x00a53c0f in XGetImage (dpy=0x88a4b80, d=27263827, x=0, y=0, width=16, height=16, plane_mask=4294967295, format=2) at GetImage.c:75
#26 0x005816a7 in ?? () from /usr/local/lib/libcvi.so
#27 0x004fcad5 in ?? () from /usr/local/lib/libcvi.so
#28 0x004fd806 in ?? () from /usr/local/lib/libcvi.so
#29 0x004d782f in ?? () from /usr/local/lib/libcvi.so
#30 0x004d7c4a in ?? () from /usr/local/lib/libcvi.so
#31 0x00595eb3 in ?? () from /usr/local/lib/libcvi.so
#32 0x004d4474 in ?? () from /usr/local/lib/libcvi.so
#33 0x004d4e29 in ?? () from /usr/local/lib/libcvi.so
#34 0x004ce1e7 in ?? () from /usr/local/lib/libcvi.so
#35 0x004d0885 in ?? () from /usr/local/lib/libcvi.so
#36 0x004d096f in ?? () from /usr/local/lib/libcvi.so
#37 0x00546deb in ?? () from /usr/local/lib/libcvi.so
#38 0x00560664 in ?? () from /usr/local/lib/libcvi.so
#39 0x00560a9a in ?? () from /usr/local/lib/libcvi.so
#40 0x005612ce in ?? () from /usr/local/lib/libcvi.so
#41 0x005ec170 in ?? () from /usr/local/lib/libcvi.so
#42 0x005e7620 in ?? () from /usr/local/lib/libcvi.so
#43 0x003fd084 in ?? () from /usr/local/lib/libcvi.so
#44 0x002721ca in ?? () from /usr/local/lib/libcvi.so
#45 0x002729ba in ?? () from /usr/local/lib/libcvi.so
#46 0x00272d9d in ?? () from /usr/local/lib/libcvi.so
#47 0x00272e41 in RunUserInterface () from /usr/local/lib/libcvi.so
#48 0x0808fc29 in main (argc=2, argv=0xbff67ed4) at MyProg.c:1544

As to what my programs do, the one whose core dumps are listed here is a control command system for a particle accelerator and has hundreds of panels, graphs, strip charts, controls, tabs, etc... It communicates with hardware via TCP/IP but doesn't do anything 'weird' besides that (no external dependencies besides CVI). About 120 000 lines of code, although not all in that executable (but that's still the biggest), 2 threads.

 

The other one which has only crashed with that XIO message twice so far is also a control command program but for a radiation tomograph. It communicates with custom hardware via serial port but it otherwise simple: one panel with a few numerics, 3 2D graphs and only 3000 lines of code, no threads. Unfortunately I lost the 2 core dump files in a hard disk crash last week. It also works with simulated hardware and I'm running it hard like that right now to try and get it to crash. If I get it to crash I'm willing to give you the code to try and run it.

0 Kudos
Message 18 of 36
(10,479 Views)

I don't know if we can do anything to get you the debug symbols for libcvi.so. 

 

What do you mean by "it also works with simulated hardware?" Does that mean you do not receive any errors if you use simulated hardware or just that you can run the program with simulated hardware? How are you simulating the hardware for this?

 

 

Steven Gloor
Staff Customer Engineer - CTA, CLD
0 Kudos
Message 19 of 36
(10,452 Views)

I don't know if it's possible to generate an external file of debugging symbol to add to gdb. Maybe some expert linux guru knows that... I doubt you'll give me the cvi source code to feed the debugger, but maybe a -ggdb -O1 version of libcvi.so ?

 

Since I work with custom hardware I always put an option to generate random data in the acquisition software _instead_ of connecting to the hardware, so I can debug the soft without having to deal with the hardware. But so far I've not been able to produce the XIO crash in simulated mode, even after days of running. I'm going to try it on multiple systems. If I can get a crash, I'll then send you the source code to try. There's no reason why the hardware should influence the crash: in one case it's just a bunch of read/write to a serial port, in the other to a socket.

 

Curiously this morning I had this crash in a completely unrelated context (and different system): I opened a postscript file with ghostview, closed the image by clicking the upper right [x], and pressed [enter] in the gs editor: XIO crash ! Looking at the X11 source code (the file and line numbers are explicit in the above core dump), my understanding is that an X11 resource disapeared, in this case the image itself (because I closed it). In the case of CVI, I have no idea as they are programs supposed to run unattented forever (control/command, monitoring...) Can a connection to the X server glitch out momentarily when using it locally ? I have no idea but that doesn't seem right.

 

 

0 Kudos
Message 20 of 36
(10,445 Views)