NI-DAQmx CI task callbacks not called at high sampling rates

Stitz · ‎01-09-2025

Hi

I'm using a PXIe-8375 (slot1), PXIe-6396 (slot3) and a PXIe-6614 (slot4) in a PXIe-1071 Chassis. With the NI-DAQmx library (latest, v1.0.2) and python, I would like to read out up to 8 analog, 8 digital and 1 counter channel from the PXIe-6396 and up to 8 counter inputs from the PXIe-6614. The sample clock is generated by the PXIe-6396 and fed to the PXIe-6614 through the PXI_Trig0 line, which means I should be able to read out data with a max. sample clock of 14.29 MHz (max. ADC sample speed of PXIe-6396).

It all works reliably up to 5MHz while reading out 8 analoge channels, 8 digital inputs (one port), 1 counter (quadrature encoder) and 8 counter inputs (edge counting mode). I can see that all callback functions are called on-time, meaning there is plenty of time before the next read out occurs.

That means: With the above channel selection, I get 11 tasks/callbacks (1 AI, 1 Enc, 1 DI, 8 CI). All 11 callbacks are called and executed within 6 ms, I then need additional 6 ms to send out the raw data. With a read out speed of 50 Hz, that leaves 20 ms - 2 * 6 ms = 8 ms before the next callbacks will be called.

If I then switch to 10 MHz sample clock, not all callbacks are called/executed, not even late. The problem is with the CI callbacks. At 10 MHz, only 4 CI callbacks are called, at 14.29 MHz no CI callback is called. The other callbacks (AI, DI, Enc) are fine, I'm "just" missing some CI callbacks. I don't get any error messages. With the full channel selection, I should still get 11 tasks/callbacks, but only 7 are acutally called. These 7 are called and executed within 8 ms, so there should be enough time to call the remaining 4 CI callbacks.

It gets even more misterious when I select only part of the counter inputs on the PXI-6614 (at 10 MHz):

Channels 1, 2, 3, 4: All works fine
Channels 1, 2, 6, 7: Only channel 1 and 2 callbacks are called
Channels 2, 3, 4, 5: Only channel 2 and 3 callbacks are called
Channels 1, 5, 6, 7: Only channel 1 callback is called, I additionally get the nidaqmx.error.DaqReadError mentioning that the application is not able to keep up with the hardware acquisition
Channels 4, 5, 6, 7: All works fine
Channels 3, 4: All works fine

Before executing test no. 6 above, I had assumed that there are like two groups of channels that cannot be read out together at 10 MHz, but test no. 6disagrees with that theory.

Some other maybe helpful information:

- A single CI callback call at 10 MHz needs 0.98 ms to process.

- With full channel selection, all 11 tasks are started without error (task.start() is inside try/exception, I don't get an exeception).

Any help is greatly appreciated!

Stephan

Kevin_Price · ‎01-10-2025

Sorry, I'm no help with anything related to Python callbacks. Just wanted to give a little nudge for a small troubleshooting step.

Have you tried a lower callback rate? There's longstanding tribal knowledge around here that reading from DAQmx task buffers at ~10 Hz is a good sweet spot for a wide range of sample rates. With 11 distinct tasks and callbacks, you may want to dial back to somewhere in the 2-5 Hz range, at least for troubleshooting purposes.

Another thought: if all your tasks are sync'ed via shared sample clock, can you just have one actual callback function and then read data from all 11 tasks within that function? Again, at least as an experiment for troubleshooting purposes? (In case the main program architecture encapsulates tasks in a way that makes global-like access more difficult).

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

ft_06 · ‎01-13-2025

Based on earlier discussions with Kevin Price, I am myself now controlling the callbacks rate in python as is:

for index_task, task in enumerate(all_tasks):
task.timing.samp_quant_samp_mode = nidaqmx.constants.AcquisitionType.CONTINUOUS
task.in_stream.input_buf_size = 20 * config_object.master_samp_clk # used for buffer size => this is sizing exactly the data buffer, here 20s, which is very exagerated, but not so big when working with 1MHz+1ADC cards
task.register_every_n_samples_acquired_into_buffer_event(config_object.master_samp_clk, functools.partial(callback_nidaq, index_task)) => registering here for an event every second (my real-time constraints are just graphical display, no pressure 😉 )

I guess you are using the register_every_.... API call to set 20ms callback occurrence, you can tune to Kevin's recommendations

When debugging, I was dumping task.in_stream.avail_samp_per_chan for all tasks from the callbacks. With high cpu loads, callbacks could be delayed a lot and it was proving it. But I would get errors at some point, which you do not get 😞

If you do not set task.in_stream.input_buf_size, you can dump it, it will tell you what nidaqmx has decided through its internal heuristics. That could eventually explain why you don't get any error (well, I am not very optimistic, I have observed on my cards that nidaqmx is not setting seconds of buffering, rather just a few callbacks call and it is probably even worse for very high rate cards)

Sorry if you already tried this first level of observability

Stitz · ‎01-13-2025

Lowering the callback rate to 10 Hz results in this error:

nidaqmx.errors.DaqReadError:  The application is not able to keep up with the hardware acquisition. Increasing the buffer size, reading the data more frequently, or specifying a fixed number of samples to read instead of reading all available samples might correct the problem.

So I assume the 50 Hz callback rate is needed.

I rewrote the code to read data from all tasks in one callback funtion, called by the master task. The behavior is the same, I cannot read all counter channels above 5 MHz sampling rate. However, I get a different error message:

nidaqmx.errors.DaqReadError: Onboard device memory overflow. Because of system and/or bus-bandwidth limitations, the driver could not read data from the device fast enough to keep up with the device throughput. Reduce your sample rate. If your data transfer method is interrupts, try using DMA or USB Bulk. You can also use a product with more onboard memory or reduce the number of programs your computer is executing concurrently.

I made sure that the transfer mechanics of all tasks is set to DMA. I'm not sure what is meant by "Onboard device memory", the internal buffer memory of the PXIe-6614 or the RAM of the readout computer?

Stephan

Stitz · ‎01-13-2025

Yes, I do use the "register_every_", but I currently do not set the task.in_stream.input_buf_size, but I will try to play around with it to see if it helps.

Thanks for both your replys!

Stitz · ‎01-14-2025

If I do not set the buffer size manually, nidaqmx assigns 5x the buffer size it would actually need:

1 MHz sample rate, 50Hz buffer readout rate = 100'000 buffer size (needed would be 20'000)
10 MHz sample rate, 50Hz buffer readout rate = 1'000'000 buffer size (needed would be 200'000)

I'm not able to set a smaller buffer size then the auto-calculated one. I can only assign a bigger buffer size, without change in behavior. I've also tried to set the buffer size really big and then reduce the buffer readout rate, without change in behavior.

Concerning the readout computer:

The RAM of the computer is 2 x 16 GB DDR5 running at 4800 MHz, I don't think this causes the problem.
The computer is running RHEL8, the current driver installed is major version 23, minor version 3 and update version 0. There would be a newer version (24 Q4), but there were no issues listed in previous driver versions related to my issue. As I'm not allowed to update the driver myself, this would require some time to test...

But I think I have found the issue now:

The PCI-8375 card that we use is connected to PCI Express and when I get the PCI information, it shows two signal processing controllers, the NI PXIe-6614 and NI 799f (I assume this is the PXIe-6396). They both use PCIe v1 with 2.5GT/s, but the PXIe-6396 is connected to 4 lanes, whereas the PXIe-6614 is connected to only 1 lane. One lane has a raw data throughput of 250 MB/s, which is below the needed 320 MB/s for 8 channels of int32 size at 10 MHz sampling rate (but fine with 5 MHz).
In any case, the glass fiber connection between the PXIe-8375 and the PCIe-8375 is limited to 838 MB/s according to NI. I assume this uses PCIe v1 and 4 lanes, resulting at 1 GB/s in theory and the stated 838 MB/s in reality with the overhead. Using 8x AI @ int32, (8+1)x CI @ int32 and 1x DI @ int8 at 10 MHz results in 690 MB/s raw data - too much for the PXIe/PCI-8375?

I have seen that there is a newer card to control an NI chassis, the PXIe-8301, using thunderbolt with a speed of up to 2.3 GB/s. This I guess would for sure solve the issue with the connection between the chassis and the computer. However, I don't know if it would solve the issue of the PXIe-6614 using only 1 lane on the PCIe bus?

Like can the PXIe-6614 use 4 lanes? Otherwise it would not make sense to buy a PXIe-8301...

Stephan

ft_06 · ‎01-14-2025

I checked the dataspec from PXIe 6614 (attached) and it seems to imply that form factor is x1 only so this would be a bottleneck

==================================================================================

Big buffer and low readout rate is effectively the experiment that you had to perform. Not helping unfortunately.

What is weird is that you don't get:

- any error because you would not be able to absorb the 8 data streams from the card to the computer (so card onboard memory would not be freed)

- or any warning at config/start step indicating you that some data streams would be dropped because the PCIe link is not fast enough

You could probably check what is happening by dumping all "task.in_stream.avail_samp_per_chan" in 1 callback (I just rechecked this OK, I have 2 callbacks and I can dump all info on all tasks in any callback). You would see if some tasks are really alive.... even if they don't give you any callback.

Well, in theory, these counters should not be running otherwise you would get an error

Stitz · ‎01-14-2025

I was looking at the specification of the PXIe-6614 as well and saw the x1, but the same entry exists in the PXIe-6396 specification sheet. And the PXIe-6396 is connected to PCIe 4 lanes and can run with 10 MHz and 8 channels at int32.

So it is either a bug in the PXIe-6396 specification sheet or the form factor is not the number of lanes but something else (maybe physical width of the card in no. of slots?)

============================================================

I've added a statement to print out the "task.in_stream.avail_samp_per_chan" in my callback. In my test script I have one callback function for all tasks. Before I read out any buffer, I get:

EI task: <- Encoder master task on PXI-6396, I didn't print that out because of the way the script works, but it will be ok for sure
AI task: 203256
DI task: 203264
CI task 0: 1055
CI task 1: 1023
CI task 2: 991
CI task 3: 991
CI task 4: 204544
CI task 5: 204768
CI task 6: 204928
CI task 7: 205056

I would expect a bit more than 200'00 samples in each buffer. It looks like the PXIe-6614 cannot aqcuire more than 4 channels at 10 MHz.

If I run the test at 1 MHz sample rate, I get slightly more than 20'000 samples in each buffer, as expected.

ft_06 · ‎01-14-2025

To be honest, the 1x was surprising me a bit. I remember now we looked at the full 63xx product ecosystem and I rechecked that PCIe 6376 is 4x (3.5MS/s/ch and 16bit only resolution). So a 6386 or 6396 should be the same or more. Yes, PXIe data seems to be wrong

========================================================================

Have you dumped several times to see the evolution ? BTW, the value keeps increasing if you do not read the data (but then it is good to size the buffer big enough...) and you still get the callbacks.

But even without this evolution, it really looks like some CI tasks are just dropped without warning or error. That is beyond my knowledge 😞

Stitz · ‎01-15-2025

Yes, I have tried to dump it several times. The "working" CI tasks count up normally until the buffer is full, whereas the "non-working" CI tasks do not count up.

Example:

1st callback:
- CI task 0: 1055
- CI task 1: 1023
- CI task 2: 991
- CI task 3: 991
- CI task 4: 205184
- CI task 5: 205376
- CI task 6: 205536
- CI task 7: 205664
2nd callback:
- CI task 0: 1055
- CI task 1: 1023
- CI task 2: 991
- CI task 3: 991
- CI task 4: 404096
- CI task 5: 404224
- CI task 6: 404352
- CI task 7: 404512

Looks to me as if the first 4 tasks acquire for ~ 1000 samples and then just die without error message (or the error message is ignored by the nidaqmx python lib, who knows...).

I've tried to increase the buffer size to a ridiculous size of 500 million samples per channel, with the same behavior. I do get an allocation error when trying to set a buffer size of 1 billion samples per channel, so my assumption is that there would be enough onboard memory to keep the samples.

I've reached out to our NI contact to clarify where the bottleneck is, the PXIe/PCIe-8375 or the PXIe-6614. I will also try to update the nidaqmx driver, but I don't expect a change in behavior.

Thanks a lot for your help!!

Multifunction DAQ

NI-DAQmx CI task callbacks not called at high sampling rates

NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates

Re: NI-DAQmx CI task callbacks not called at high sampling rates