NIDAQmx threading multichannel streaming lag

v_pondfroth · ‎08-20-2024

I am building a GUI in Python 3.12.4 to stream data from a BNC-2090A in real time via NI PCIe-6363. I want to sample at 1000 Hz from 16 channels, and it works for 1 channel. The problem: when I add more than one aivoltage channel, the stream lags, intersperses gaps in the data, and can no longer keep up. With each additional channel, the problem gets worse [Case A: SampleRate = 1000]. I can get around this problem by reducing the sampling rate by a factor equal to the number of channels [Case B: SampleRate = 1000/n_chan]. But the hardware should be able to keep up with 16 channels at 1000 Hz without issue, so I think this is an operator, coding error. Open to suggestions and any advice, please!

I am using the main thread to update a matplotlib canvas with tkinter and listen to a toggle button callback (to start or stop acquisition). I created a background thread to run a DAQ function that uses AnalogMultiChannelReader functionality to record at least 1 channel at 1 kHz sampling rate. In the comparison below, both cases use n_chan=3.

Case A: SampleRate = 1000

Size of data array being plotted ("ai_data"): (3,3000)

Mean iteration duration at 20 seconds: 212.7 ms

Size of buffer queue ("buff_data"): (4, 30)

Case B: SampleRate = 1000/n_chan

Size of data array being plotted: (3,3000)

Mean iteration duration at 20 seconds: 5.2 ms

Size of buffer queue ("buff_data"): (4, 30)

In writing this, I realized (perhaps incorrectly and fortunate coincidence?) the size of the buffer queue should not be the same between Case A and B if the sampling rates are different. So, I changed the number of samples per channel in the task cfg_samp_clk_timing (DAQ func, parameter is "Nsamples") to make the acquisition duration consistent (i.e. grabbing samples 3x faster needs to acquire 3x the number of samples to allow the task the same amount of time). Changing the definition from "Nsamples = 30" to "Nsamples = 30*n_chan" helped speed up the n_chan=3 lag, but the solution does not scale to 7 or 16 channels.

Snippet below, full code attached.

with open(os.path.join('meta_data','config.yml'), 'r') as file:
    yaml_config = yaml.safe_load(file)
for each_var in yaml_config:
    for k,v in each_var.items(): locals()[k] = v

def now():
    return Quantity(time.time_ns(),'ns',scale='ms')

last_update = now()

def DAQ(q,n_chan,chan_names,stop_event):
    global last_update
    with nidaqmx.Task() as task:
        # task.ai_channels.add_ai_voltage_chan(
        #     physical_channel = "Dev3/ai0",
        #     name_to_assign_to_channel = "Dummy channel"
        # )
        for nc,cn in zip(range(n_chan), chan_names):
            task.ai_channels.add_ai_voltage_chan(
                physical_channel = "Dev3/ai{0}".format(nc),
                name_to_assign_to_channel = cn
            )

        task.timing.cfg_samp_clk_timing(SampleRate, source="", sample_mode=AcquisitionType.CONTINUOUS, samps_per_chan=Nsamples)
        reader = AnalogMultiChannelReader(task.in_stream)
        d_out = np.zeros([n_chan, Nsamples])
        task.start()

        total_read=0
        itercount=0
        old_time = Quantity(time.time_ns(),'ns',scale='ms')
        
        while not stop_event.is_set():
            # try:
            reader.read_many_sample(data=d_out, number_of_samples_per_channel=Nsamples)# read from DAQ
            now_time = Quantity(time.time_ns(),'ns',scale='ms')
            t_out = np.linspace(0,(now_time-old_time).real,np.shape(d_out)[1]).reshape(1,np.shape(d_out)[1])
            out = np.around(np.append(t_out,d_out.astype(np.uint8),axis=0), 2) #Round all values to 6 decimals to avoid overflow
            q.put_nowait(out)
            old_time = Quantity(time.time_ns(),'ns',scale='ms')
            itercount+=1
            total_read+=np.shape(d_out)[1]
            # except:
            #     task.stop()
            #     print('Task crashed after {0} samples.'.format(total_read))
            #     var_sizes(list(locals().items()))
            #     task.start()
            
            last_update = now()

ft_06 · ‎08-20-2024

My 2 cents

Seems to me you are in same case than https://forums.ni.com/t5/Multifunction-DAQ/Slow-python-DAQmx-reading-amp-writing/td-p/4389116

30 samples per channel at 1kHz means you execute every 30ms. This is for a GUI, go for 300ms for example (well, I think this is a GUI to configure the streaming, not to display so you don't even care, go higher)

nidaqmx-python examples do:

task.timing.cfg_samp_clk_timing(1000, sample_mode=AcquisitionType.CONTINUOUS)

task.register_every_n_samples_acquired_into_buffer_event(1000, callback)

Thus every second

v_pondfroth · ‎08-20-2024

Thank you for the feedback! The GUI does two things, configure streaming, and display data in real time. I want a fast execute rate. Regarding the "register_every_n_samples_acquired_into_buffer_event", how would this work with a callback? I currently am writing to a buffer Queue() [https://docs.python.org/3/library/queue.html#queue.Queue.put_nowait] that runs inside a while loop until an event occurs (event triggered by button push on GUI).

I can't understand from the documentation [https://nidaqmx-python.readthedocs.io/en/latest/task.html#nidaqmx.task.Task.register_every_n_samples...] how "register_every_n_samples_acquired_into_buffer_event" and its callback are expected to work. Would I still use a queue to transfer the data between threads? It seems like this is introducing a second buffer step that could be slowing things down or creating problems for myself.

ft_06 · ‎08-20-2024

Sorry, I did not mean to use register_every_n_samples_acquired_into_buffer_event(), I had in fact put in bold the values used in the API calls: sample rate is 1kHz and callback is registered for 1000 samples meaning data handling is performed every second. So basically handling big chunks of data not too often

I should have used the other example i nnidaqmx-python (sampling rate is 1kHz) showing same conclusion:

while True:
data = task.read(number_of_samples_per_channel=1000)

I don't know enough of API internals to recommend one API or the other but I would expect the read_many_sample(Nsample) to be efficient, that is your program is not polling but is woken up once Nsample have been acquired.

As you want to display data in real time, you will have a tradeoff on Nsample value (typical stuff for my Power&Perf engineer background). You are probably too low and you should go for hundreds of ms, this should still give good user experience (this is not a First Person Shooter, always challenge the need 😉 ) and relax the processing contraints

The display data framework must also be efficient, a lot of factors come in

At the end, fast execute rate may simply not be possible in python, C & efficient data display framework would be needed (dev/setup complexity vs perf)

At least you should start with like too big refresh period, like 1s, to ensure you can acquire all the channels and reduce this period by steps to see the most reactive you can have in python. This may suit you finally

ft_06 · ‎10-16-2024

Adding some more information now that I have really played with continuous acquisition:

- I thought that the device was using the biggest buffer size it could allocate and was using it as a huge circular buffer; I could not find what was this max big buffer size but I was thinking something like 2x 1s of max number of samples that can be acquired (I have a 1Ms/s NIDAQ card)

- I did a design with 1kHz sampling rate, 5 channels and event every 1000 samples. Worked OK. Then tested with 10kHz and 10 000 samples. Stopped working

- then I rechecked some other topics and the doc of cfg_samp_clk_timing():

" samps_per_chan (Optional[int]) – Specifies the number of samples to acquire or generate for each channel in the task if sample_mode is FINITE_SAMPLES. If sample_mode is CONTINUOUS_SAMPLES, NI-DAQmx uses this value to determine the buffer size. This function returns an error if the specified value is negative.

Well, not clear (is buffer size 1x samps_per_chan, 2x, 4x, ... ?) but at least giving some control

Therefore I am now doing designs with samps_per_chan representing 2s of acq at this sampling rate and in parallel registering for half of samples number (to wake-up every s) through register_every_n_samples_xxx()

This can be done also for "read", just define samps_per_chan and read(samps_per_chan/2)

Note: this parameter of cfg_samp_clk_timing can also be changed through the samp_quant_samp_per_chan property.

Multifunction DAQ

NIDAQmx threading multichannel streaming lag

NIDAQmx threading multichannel streaming lag

Re: NIDAQmx threading multichannel streaming lag

Re: NIDAQmx threading multichannel streaming lag

Re: NIDAQmx threading multichannel streaming lag

Re: NIDAQmx threading multichannel streaming lag