11-18-2011 11:28 AM
Hi Everyone,
My background is as an Electrical Engineer with some programming experience, LV, C, VB etc, but nothing approaching a computer science type level of understanding.
I have a test system that contains two hardware cards, the PCI-6033E for analog input to read a thermocouple and the PCI-DIO-96 to toggle a cooling fan when a set temperature threshold is reached. There are up to 36 test stations that this program runs. Each station individually isn't complicated, but there are 36 of them.
The original VI has been around for quite some time. It was running on Windows XP and needed to be run on a Windows 7 PC. The VI was first written to use the traditional DAQ interface methods. For the drivers for the hardware to work on Windows 7 they needed to be upgraded to NI-DAQmx drivers. That meant a bit of recoding to implement the new style.
The VI and computers would run under XP for weeks on end without any trouble. The PC's nor VI was not routinely restarted. When the new version was placed on the Windows 7 PC's the VI would run for a period of time around 30 days and then a windows error box would pop up.
There are three of these same systems running this same hardware and same VI. Two of them had this error occur after 28 days and another was a few days after that.
Thinking that this could be a memory leak issue, the systems have been restarted and have been running while monitoring memory usage and there does not appear to be an increase in memory consumption over about a week's time.
I have attached the Desktop Execution Traces to this post also. I didn't see any perpetually increasing memory buffer sizes or anything. Though I do wonder what the trace means by the Handle: Entry, is that the memory location being addressed?
Also, being that this error takes about one month to occur, running repeated stress-test (noting if the duration between errors goes up or down with increases or decreases in program load) type of iterations is impractical.
Questions:
1) Does anyone know of any driver issues regarding the support of relatively old hardware with windows 7?
2) When using the Desktop Execution Trace toolkit I see Memory Allocate events, but no memory release events. Should I be expecting some type of release event to be logged?
3) What does the Handle: address mean in the trace log?
4) As mentioned above my expertise is more in the electronics and hardware arena rather than the details of memory management and Windows PC architecture. It seems to me a driver issues would likely cause an error to be thrown from windows itself rather than LabVIEW as is the case here. If there were a memory leak would that error come from LabVIEW or Windows?
5) To you, what does this evidence point to as being the problem? Bad driver, poor VI programming, other Windows 7 conflict?
6) Any advice in further tracking down this problem?
Thanks,
Brandon
11-18-2011 11:47 AM
In some cases Windows able to show in which *.sys driver an error occurred. This info should be also somewhere in saved minidump.
And also check Windows Event Logs - probably something happened shortly before Blue Screen.
11-18-2011 12:15 PM
I suspect it could be a combo of poor code implementaion combined with LV changes that does not tolerate the poorly implemented code.
Back in the day of the old DAQ stuff, LV did not have a robust method of preventing coding errors like attempting to access hardware using a reference that has been closed. Attempts like that would crash LV and give us the BSOD.
Since then LV has been upgraded to do better error handling and includes an auto-clean-up of resources when LV closes. This was realized (my speculation follows since I am not R&D) by not destroying the resource when we close them. They are only destroyed after LV exits.
SO....
if you are repeatedly opening refs in a loop, eventually the machine will run out of memory and bad things happen.
Just my 2 cents,
Ben
11-18-2011 12:15 PM
Did you ever run your new code on the XP machines?
What version of LabVIEW and DAQmx are you using? I had an issue with DAQmx causing crashing on a WindowsXP machine that was fixed with a new version of DAQmx. I can't remember the version number off the top of my head, but it was 2 years ago.
11-18-2011 01:06 PM
We will need to know What version of LabVIEW as well as the DAQmx version. A simple screenshot from MAX>> Software would suffice. Is this occuring with a VI running in the IDE or a deployed executable?
A ttaching your code would allow us to look for places the code could be optomized such as re-opening DAQmx Tasks or arrays built in loops (It Looks like you've got a few on a quad core machine and the compiler is loop expanding from the instances where a buffer keeps upping size by 4)
11-18-2011 01:21 PM
I'll need to see some statistics on the average time between crashes, how about some more data?
Just kidding, I hate these types of issues. It is always hard to tell the difference between a slow memory leak, and the highly unprobable chain of events. I have seen the latter, and it can happen that a new machine upsets the delicate balance that was keeping the old code running. As Ben suggests, I think there is probably a source for the delicate balance.
If the OS is really a suspect, you can try running in Windows XP compatibility mode. Something else I try in this case is to artificially increase the cadence of the software to try to decrease the time to failure.
11-18-2011 01:39 PM - edited 11-18-2011 01:42 PM
@Darin.K wrote:
I'll need to see some statistics on the average time between crashes, how about some more data?
Just kidding, I hate these types of issues. It is always hard to tell the difference between a slow memory leak, and the highly unprobable chain of events. I have seen the latter, and it can happen that a new machine upsets the delicate balance that was keeping the old code running. As Ben suggests, I think there is probably a source for the delicate balance.
If the OS is really a suspect, you can try running in Windows XP compatibility mode. Something else I try in this case is to artificially increase the cadence of the software to try to decrease the time to failure.
XP compatability mode won't expose the PCI bus to the OS- It simply won't allow valid assessment - Sorry Darin
The unfortunate alignment of the stars theory is valid. I've seen them too as presented here. Are ther a lot of long waits in the program?
11-18-2011 02:13 PM
@Jeff Bohrer wrote:
XP compatability mode won't expose the PCI bus to the OS- It simply won't allow valid assessment - Sorry Darin
The unfortunate alignment of the stars theory is valid. I've seen them too as presented here. Are ther a lot of long waits in the program?
I should have suspected that Microsoft would take the lazy way out, it looks like they did not provide the same passthrough you can get with VMware for example. I guess they were simply targeting their software customers. (My experience here is limited by the fact that after NI obsoleted all of my NuBus hardware I stopped purchasing bus-specific hardware).
Fortunately I don't suspect Win7 is the problem (Vista would have been a different story). I think it is the 1 in a billion event attempted millions of time per day.
11-18-2011 03:35 PM
Andrey -
In some cases Windows able to show in which *.sys driver an error occurred. This info should be also somewhere in saved minidump.
And also check Windows Event Logs - probably something happened shortly before Blue Screen.
That's kind of the problem. I looked at the "help" that was offered by windows at the time a bit, but I couldn't make any sense out of all the information that was there. There was a really long .xml file that unfortunately was mostly compuspeak to me. Some names and words were recognizable, but I couldn't get context so I didn't learn anything from it.
About the code itself. It is definitely not optimized in any shape or form. It appears to be hacked together by a lab engineer without much more than basic training in LabView. Kudos for them getting it to work when it was constructed, however now there is a problem. Unfortunately there wasn't time (budget) for a rewrite and cleanup so a quick swap of hardware related code was all that was done.
Did you ever run your new code on the XP machines?
Oddly, no. Not for any length of time. A simple functional verification was all that was done with the new code on XP. A long term run on XP will be done in the coming months.
We will need to know What version of LabVIEW as well as the DAQmx version. A simple screenshot from MAX>> Software would suffice. Is this occurring with a VI running in the IDE or a deployed executable?
The VI is running from an installer compiled from Labview 2010.
The MAX Screenshot:
Attaching your code would allow us to look for places the code could be optimized such as re-opening DAQmx Tasks or arrays built in loops (It Looks like you've got a few on a quad core machine and the compiler is loop expanding from the instances where a buffer keeps upping size by 4)
The trace has a number of memory resizes that increase the memory allocated by 8, up to a certain level, something like 804 or some number. This would occur many times and always stop incrementing at that same number 804 (or whatever it actually was). There would be a different handle for each of these sequences. The fact that it does not keep incrementing towards infinity makes me think this is not a big issue. In the trace log it is lines 1473 through 1572 and 1632 through 1731 for instance. That section of code is here:
Thank you all for being helpful on this one. I appreciate the time you have taken to offer advice and solutions. The top level VI is HILT36A.vi.