NI Linux Real-Time Discussions

cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamic Memory allocations by NI-Drivers

We are using PXI Systems in a realtime environment. I'm currently trying to track down a performance problem that we see since we migrated from PharLap ETS to NI Linux RT. I carefully designed my code, not to allocate/deallocate memory while in realtime operation (it's common knowledge that hard realtime requirements and dynamic memory allocation doesn't fit together). 

 

To find any hidden allocations/deallocations I added hooks to my application so I can track all calls to glibc's malloc/free. To my surprise, there are a lot of allocations/deallocations going on but they are made by NI/System code. For example, calling a function such as DAQmxWriteAnalogF64 or DAQmxWaitForNextSampleClock will allocate memory.

 

Why would they design their drivers in such a way? Seems to be a bad choice, especially since NI Linux RT can't handle dynamic memory as good as Phar Lap could (in terms of determinism).

 

Many thanks for your feedback.

0 Kudos
Message 1 of 7
(2,458 Views)

Damn, caught us red-handed.

 

Can you upload some reproducing cases for us (ideally differentiated by driver)?

 

Also, can you relate the specifics of your determinism requirements? (I guess you can DM if need be)

0 Kudos
Message 2 of 7
(2,417 Views)

Hi rtollert,

 

I have created this simple program to demonstrate the allocations I see when using DAQmx (also attached as zip file).

For simplicity you can compile this directly on the target with



gcc AllocationDemo.c -L. -lnidaqmx

 

The output I get is this:

Malloc: 24005, Calloc: 0, Realloc: 0, Free: 24004

 

Notice that when you change the number of iterations from 3000 to something else, the number of Mallocs/Frees will change accordingly. I guess it would be acceptable if there were allocations in the first few loops (like a warmup time) but this is not the case here. 

#include "NIDAQmx.h"
#include <stdint.h>
#include <stddef.h>
#include <stdio.h>
#include <inttypes.h>
#define DAQmxErrChk(functionCall) if( DAQmxFailed(error=(functionCall)) ) goto Error; else

extern void *__libc_malloc(size_t size);
extern void *__libc_calloc(size_t num, size_t size);
extern void *__libc_realloc(void *ptr, size_t size);
extern void __libc_free(void *ptr);

_Atomic int64_t mallocs = 0;
_Atomic int64_t callocs = 0;
_Atomic int64_t reallocs = 0;
_Atomic int64_t frees = 0;

volatile int CountActive = 0;

void *malloc(size_t size)
{
    if (size > 0 && CountActive)
    {
        mallocs++;
    }
    return __libc_malloc(size);
}

void *calloc(size_t num, size_t size)
{
    if (CountActive)
    {
        callocs++;
    }
    return __libc_calloc(num, size);
}

void *realloc(void* ptr, size_t size)
{
    if (CountActive)
    {
        reallocs++;
    }
    return __libc_realloc(ptr, size);
}

void free(void *ptr)
{
    if (ptr)
    {
        if (CountActive)
        {
            frees++;
        }
        __libc_free(ptr);
    }
}

int main(void)
{
    int32_t error = 0;

    int32_t LoopCnt = 0;
    
    double Rate = 1000; /* 1kHz HWTSP loop */
    
    TaskHandle _PXI1Slot5AOTask = NULL;
    DAQmxErrChk(DAQmxCreateTask("PXI1Slot5AOTask", &_PXI1Slot5AOTask));
    
    DAQmxErrChk(DAQmxCreateAOVoltageChan(_PXI1Slot5AOTask, "PXI1Slot5/ao0:3", "", -10.0, +10.0, DAQmx_Val_Volts, NULL));
    DAQmxErrChk(DAQmxCfgSampClkTiming(_PXI1Slot5AOTask, NULL, Rate, DAQmx_Val_Rising, DAQmx_Val_HWTimedSinglePoint, 1))
    DAQmxErrChk(DAQmxExportSignal(_PXI1Slot5AOTask,DAQmx_Val_SampleClock, "/PXI1Slot5/PXI_Trig0"));
    
    DAQmxErrChk(DAQmxStartTask(_PXI1Slot5AOTask));
        

    CountActive = 1; /* Start counting allocations */

    double values[4] = {0.0};
    int32_t writecount;
    bool32 islate = 0;
    
    while (LoopCnt < 3000 && !islate)
    {
        DAQmxErrChk(DAQmxWriteAnalogF64(_PXI1Slot5AOTask, 1, 1, 0, DAQmx_Val_GroupByChannel, values, &writecount, NULL));        
        DAQmxErrChk(DAQmxWaitForNextSampleClock(_PXI1Slot5AOTask, 4, &islate));
        
        LoopCnt++;
    }
 Error:
    CountActive = 0; /* Stop counting allocations */
    
    DAQmxClearTask(_PXI1Slot5AOTask);
    
    printf("Malloc: %" PRId64 ", Calloc: %" PRId64 ", Realloc: %" PRId64 ", Free: %" PRId64 "\n", mallocs, callocs, reallocs, frees);
    
    return (int)error;
}
Message 3 of 7
(2,391 Views)

Hi Krid,

 

Thank you for reporting this and for providing a sample program demonstrating the problem.

 

I tried your sample program on desktop Linux with a PXIe-6363 and set a breakpoint on malloc() to see what's going on, and I see three categories of heap allocations:

  • Lazy/first-time initialization. For example, the first time DAQmxWriteAnalogF64() is called, the data transfer code has to resize some buffers, but then it reuses them throughout the lifetime of the task.
  • Background thread allocations. For example, there is a thread named "MxsProxyGC" which is related to communication with the MXS storage database where DAQmx stores configuration info. This thread wakes up and uses the heap after your program starts counting allocations.
  • Temporary buffers allocated by the main thread. For example, DAQmx allocates temporary buffers for parameter passing when calling into the kernel. In HWTSP applications, it really ought to use the stack or a preallocated buffer, but in this case it's using the heap, so I filed bug 1225929.

For reference, could you post what device model you're using?

---
Brad Keryan
NI R&D
0 Kudos
Message 4 of 7
(2,345 Views)

Hi Krid, I'd like to get a better understanding on the application that you're working on that initially manifested unacceptable RT performance and how you went about the process of validating the performance you were looking for. Your scenario could be relevant in helping others make the migration successfully as well as helping us hone our testing methodology to align with cases such as yours. Would you be able to share a bit more in-depth with me at the application level? 

0 Kudos
Message 5 of 7
(2,332 Views)

Thanks for your answers. For the example program I also used a PXIe-6363 card. 

 

The real use case is that the code runs as a VeriStand Custom Device. We have several PXI Systems with fully equipped 18-Slot Chassis (multifunction DAQ cards + FPGA cards) that operate in different locations world wide. We use them for HIL testing our own embedded control units. You can find more information here: https://www.ni.com/en-us/innovations/case-studies/19/improve-test-coverage-documentation-and-traceab...

 

We have excluded one of the systems from productive usage and use it as a test and development environment for the migration to NI Linux RT (once the migration is successful, all systems will be upgraded to Linux RT). The problem I see is simply that a configuration ("System Definition") that works well on Phar Lap, causes problems on Linux RT. 

 

By "problems" I mean that we see realtime violations (late errors) on Linux RT, although we now use a more powerful PXI Controller (Phar Lab: 8135, Linux RT: 8880). We have 80+ Testprojects running on these systems and the majority uses a 1kHz PCL rate (i.e. 1kHz HWTSP), some 2kHz.

 

Our requirement is simply that these projects run at their designated rates without late errors (and since it works with Phar Lap, we think this requirement is not too far fetched).

 

I started with the Linux RT migration about 6 months ago and the majority of the time I spent on this is trying to debug the late errors, without much success, unfortunately. I meanwhile think that the problem is not related to our own code, but more likely to be caused by the system or drivers. For example, if our code runs in a realtime loop (like in my example program), a loop is "late" because "DAQmxWaitForNextSampleClock()" takes too long to return. 

 

If I could get some help on this from NI, that would be much appreciated. 

 

 

0 Kudos
Message 6 of 7
(2,293 Views)

I've got new information regarding the "late errors". Albeit the dynamic memory allocations are certainly not ideal, they are probably not the root cause.

 

I realized that the late errors are related to GPIB communications. Our systems have external power supplies that we control via GPIB. When stopping the GPIB communication, the system runs for 12+ hours without late errors. With GPIB active, late errors occur after a few minutes.

 

This happens even if the GPIB communication is running in a separate, non realtime process. There seems to be an interference between the GPIB and DAQmx drivers on the system level. Obviously these things are out of control of end-users, we need support from NI to fix this.

 

Please, have a driver/system developer contact me. My company has an enterprise agreement with NI, if that helps anything...

 

 

 

 

 

 

0 Kudos
Message 7 of 7
(2,248 Views)