03-06-2015 09:10 AM - edited 03-06-2015 09:13 AM
Hey gdargaud,
I saw some strange behavior, but I am not sure if it was the correct-wrong behavior. Here is what I found
1) I believe the application crashed the other day because when I checked the VM, the user interface was not running
2) I did not get an error in the console. It returned: Aborted (core dumped)
3) I get the following as the last line in the log file:
2015/03/03 10:00:17 +0s - E -> BUSY 0205 00DE 005D 0054XIO:
fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0" after 17 requests (17 known processed) with 0 events remaining.
4) I found two dump files from this date, 1 was 1.9 GB and the other was around 912 MB.
After some searching, I found that I can run the command
gdb <program> <core dump> and it seemed like I was able to load the program and its symbols properly. It was at this point that I thought I would ask you for some more guidence. I have two questions:
1) What program do you use to read dump files?
2) How do I use that program to find the callstack you were reporting? A documentation page would be great to share as well.
I hope this core dump file has the evidence we need.
Thank you!
-KP
03-16-2015 04:29 PM
Hey gdargaud,
Any thoughts on my last post?
Thanks!
-KP
03-17-2015 03:39 AM
Hello Kurt,
good, you got a core dump ! gdb is what is commonly used to analyse them, like you started. A single 'bt' (backtrace) in gdb will give you the call stack.
I'm a little surprised that you got _two_ cores, particularly with the same date. Are they they exact same timestamp ? Once a program is crashed it's not like it can generate a second core !!! And I don't think my prog uses as much memory as 1.9Gb. Do they both work with the tp-tomo executable ? I'd like to see both backtraces.
If the backtrace has empty lines like
#10 0xf759cccb in ?? () from /usr/local/lib/libcvi.so
It's at this stage that you should try giving it access to the source code used to compile that version of libcvi.
03-31-2015 10:50 AM
So, any deeper analysis on that problem ?
08-25-2015 04:39 PM
For any poor soul who has the same issue, this behavior is due to a known libxcb bug.
We tested a small test application on Scientific Linux 6.4 (32bit) (has libxcb version 1.8.1) and Scientific Linux 7.1 (64bit) (has libxcb 1.9-5) to see if a new version os libxcb fixed the issue. Unfortunately, as of writing this email, the bug still persists in the codebase. This bug does not exist in older versions of the libxcb library such as libxcb-1.5-1. Our suggested work around is to downgrade your version of libxcb. Here are some instructions on how to do that:
1. Go to [System]->[Administration]->[Add/Remove Software]:
- If installed remove libxcb devel package and accept removal of all devel packages that depend on it.
2. Download older libxcb 1.5-1 from:
http://rpm.pbone.net/index.php3/stat/4/idpl/18406386/dir/scientific_linux_6/com/libxcb-1.5-1.el6.i68...
3. Run command:
sudo rpm -Uvh --oldpackage libxcb-1.5-1.el6.i686.rpm
(OR)
sudo yum downgrade libxcb-1.5-1.el6.i686.rpm
This should work around the problem. If anyone finds that this work around fixes their issue, please post it here.
Thanks,
KP
08-26-2015 04:03 AM
Thanks Kurt.
I'll add a few things:
- To know which version of libxcb you have:
$ sudo yum info libxcb
or $ rpm -q libxcb libxcb-1.5-1.el6.i686
- A quick diagnostic is to run this for 15 to 90 minutes:
// Compile with: gcc test.c -m32 -lX11 && time ./a.out #include <X11/Xlib.h> void main(void) { Display *d = XOpenDisplay(NULL); if (d) for(;;) XNoOp(d); }
It doesn't crash using -m64 on a 64 bit system, but -m32 will
- Make sure to disable yum autoupdates or it will replace your old package. See here: http://www.admon.org/applications/enabledisable-the-automatic-yum-updates/
- Add the following to the main section in /etc/yum.conf: exclude=libxcb