12-03-2024 04:33 AM
Hello,
While checking the scalability of the finite acquisition for very long acquisitions, I got some error message that DAQmx_Buf_Input_BufSize was too big and should be < 0xFFFFFFFF or 0x1FFFFFFF.
I am using cfg_samp_clk_timing(.., samps_per_chan) or samp_quant_samp_per_chan to configure the duration of the test. However, I did not expect it to also allocate a buffer of the same size. This will clearly limit the final duration of acquisition based on RAM and not storage size/speed (OK, this limits mostly at high sampling rate... and long acquisitions would probably be at lower rates). Is my understanding correct ?
I do not need a very big "acquisition time" accuracy. If I want to scale in time, should I then go using only continuous mode and stop based on some timer or a callback counting the acquired samples ? In that case samp_quant_samp_per_chan is used to determine the buffer size according to the doc so you can adjust to something much smaller (of course, saving to storage must go fast enough to ensure this buffer can be emptied).
But well, in the finite acquisition case, we could also say that this parameter determines the buffer size so I am not sure about what the doc meant
Solved! Go to Solution.
12-03-2024 11:28 AM
For finite acquisition, the total # samples you plan to acquire will *also* set the buffer size. DAQmx's assumptions for finite acquisition are to fill every slot in the buffer exactly once and never wrap around. Finite acquisitions most typically read all samples at once after acquisition is completed, so there needs to be a buffer slot for every one of them.
So yes, if you don't need extreme precision in your total # samples or acquisition time, you can configure instead for continuous acquisition. In this mode, DAQmx will (if necessary) make a decent guess at a buffer size and keep wrapping around to refill it. You as the app programmer must "stay ahead" of this process by reading samples out of the buffer as the acquisition continues, opening up buffer slots for new samples to fill. And so on.
With many devices, there's a "sneaky" way to get both a huge and precise # of total samples by declaring & servicing a continuous acquisition while generating a finite sample clock with a counter. See this recent thread where I described it in more detail. (Note: though the linked thread is about generating output, most of the same ideas carry over to capturing input.)
-Kevin P
12-04-2024 04:10 AM
Hello,
I will set your post as a solution. Hopefully, I don't need accuracy and timer has been tested as sufficient
Do you have more insight in real-time constraints of continuous mode ? I tried some acquisition at sampling rate 100 kHz and I found it prone to fail to any perturbation (if I prevent perturbations, it can run for 10s of minutes)
For example, it was failing after 5 or 6s... even when I increased samp_quant_samp_per_chan to 8s of samples, then 16s. If I think about this, I can identify those RT constraints:
- 2K samples onboard buffer -> this one has to be dumped into buffer in RAM
- 8 or 16s buffer of samples (per channel) in RAM. It is filled from device onboard buffer and emptied within my callbacks
My callbacks are like 4 to 5% of CPU load (and well, I put 8 or 16s of buffer size and it is failing after 5 to 6s) so I have the feeling that this is the emptying of onboard buffer that fails. What can we check for that ?
I will also not read the data in my callback to see if that fills the RAM buffer
12-04-2024 10:12 AM
A longstanding general rule of thumb is that most continuous acquisitions behave well when you continually request ~0.1 sec worth of samples per read. That'll fire your callbacks for reading 10 times a second.
With app memory being so plentiful these days, I habitually make my buffers at least 5-10 seconds worth of samples even though 1 second would probably be plenty. The key is to make the buffer considerably bigger than the # samples you read and to make sure that each read is pretty much retrieving all the accumulated samples.
-Kevin P
12-04-2024 10:55 AM
Thanks.
I am used to these tradeoffs on embedded platforms (worked on 2.75G modems, audio playback, ...) and I am still finding it quite empirical in nidaq case due to these observations:
- I can put a huge buffer size like 1000x instead of 4x or 8x... still it fails after 5s at 100K and the buffer is not at all filled
- if my callbacks are triggered 2x faster, it improves a lot. Or if I process less in the callback (still processing was already like just 4% CPU)
- API does not care if I do task.read() or not. Simply running the callback and returning from it seems to move the "read/write" pointers and thus freeing space for next samples.
That is why I am a bit confused into which buffer is copied to which buffer and from which buffer it reads.
At the end, your rule of thumb does not work for me (although this is exactly the typical rule of thumb for this kind of use case) if I stick to "every 1s" callback
My assumption is that the copy from onboard buffer to a RAM buffer is hidden and that just buffer size and callback rate should be linked. So I should never fail at 5s when my buffer is like 1000s 😉
But I will tune my callback to a higher rate as it improves the situation, even if I do not understand the rationale
12-05-2024 09:50 AM - edited 12-05-2024 09:54 AM
It'll be helpful for you to post the code. Mainly all the stuff that configures your task and then the whole callback function.
I don't know if I personally can help much b/c I only program in LabVIEW and don't know my way around the text language syntax very well. Especially as it might relate to the mechanism used for DAQmx-based callbacks. But there are other knowledgeable people around here who can help.
Much of what you describe cuts against a longstanding body of common knowledge around here, so I'm interested to see whether a sensible explanation can be found.
-Kevin P
P.S. When your code "fails", what is the error number and error text?
12-06-2024 03:22 AM
Let's have a try 😉
I have several devices (another topic helped me setting up a master and slaves and use RTSI to sync them)
master_samp_clk would be 100K in failing case
master_task.timing.cfg_samp_clk_timing(rate = config_object.master_samp_clk)
for slave_task in all_slave_tasks:
slave_task.timing.cfg_samp_clk_timing(rate = config_object.master_samp_clk)
buffer_size is 8 times samples per second in 1 channel. But I also tried 1000x and it did not improve
callback used to be registered for 1s, that is master_samp_clk samples. By putting /2 and now /4 (250ms), it improves a lot
for index_task, task in enumerate(all_tasks):
task.timing.samp_quant_samp_mode = nidaqmx.constants.AcquisitionType.CONTINUOUS
task.timing.samp_quant_samp_per_chan = 8 * config_object.master_samp_clk # used for buffer size
if config_object.has_ui == True:
task.register_every_n_samples_acquired_into_buffer_event(config_object.master_samp_clk, functools.partial(callback_nidaq, index_task))
callbacks add data to PowerRails classes, which are then used in another thread to draw. The lock is here as 1 device/callback is U, the other is I and I clearly have seen them working in parallel and unfortunately I compute P = U*I so they are linked. So the lock is defensive
def callback_nidaq(index_task, task_handle, every_n_samples_event_type, number_of_samples, callback_data):
# find nidaq task associated to callback
task = all_tasks[index_task]
...
# stop here if task is stopped as we can't read data without error
# lock is here to handle a stop while doing a read
myLock.acquire()
if task.is_task_done() == True:
myLock.release()
return 0
data = task.read(number_of_samples_per_channel = number_of_samples)
myLock.release()
# update dataset for each channel in task
# 1 - we find the right power_rail and V or I for this data thanks to channel name
for index_channel, channel in enumerate(task.ai_channels):
for index_rail, power_rail in enumerate(all_rails):
if power_rail.v_daq_name == channel.name:
myLock.acquire() # lock is to protect callbacks from multiple devices between themselves as P can be touched by all devices (unlike V and I)
# time_range: all rails are at time cb_steps * samples * sampling increment. Then each rail has an offset
time_range = range(cb_steps[index_task] * number_of_samples * power_rail.time_increment + power_rail.time_offset, (cb_steps[index_task] + 1) * number_of_samples * power_rail.time_increment + power_rail.time_offset, power_rail.time_increment)
power_rail.dataset_x_v += time_range
power_rail.dataset_y_v += data[index_channel]
myLock.release()
# only 1 power rail per ai channel, we can stop the search loop
break
if power_rail.i_daq_name == channel.name:
./... Same for I
cb_steps[index_task] += 1
# GUI update
if mainWin.get_ui_state() == True:
mainWin.signal_update_plots(index_task, cb_steps[index_task], number_of_samples)
return 0
12-06-2024 10:23 AM
I can roughly follow substantial parts of the code, but don't have any knowledge of your language's callback and lock/unlock mechanisms.
That said, I'm a little suspicious that the lock/unlock is leading to access contention and blocking conditions that contribute to the problem you see. You lock before reading, then unlock after read completes.
When asking for a lot of samples per read, the function will block until that number of samples is available. Now in theory, that shouldn't ever happen inside your callback because the callback should only fire *after* that number *is* available. But maybe it's worth a little debug line in your code to query for the # available samples before reading? That should be a near-constant value that's probably just a little higher than the # samples that's supposed to trigger the callback. If it behaves differently, that'll be a helpful clue.
It shouldn't ever be smaller than the triggering number. But you also don't want to see it being much bigger, especially not if that # keeps growing. If it *does* keep growing, you've got some kind of issue that's slowing your ability to process your way through these callbacks. Based on my experience with DAQmx and parallel processes, I'm not all that inclined to suspect that it's purely a DAQmx problem. So I wonder about the lock/unlock mechanism and whether other parallel processes are interfering with your callbacks by acquiring those locks and holding them so long that your callbacks get delayed and queued up.
That bit of "thinking out loud" doesn't seem to totally fit your observations, but maybe it's a launch point for better ideas.
-Kevin P
12-09-2024 03:32 AM
Hello,
- Locks are always scary. That should not impact as their use here is very controlled but you are right in mentionning them so I disabled them (and keeping them disabled for the rest of testing)... and that did not improve at all 😞
- I found API task.in_stream.avail_samp_per_chan, here are the observations (every s, callback shall get 100 000 samples):
* available samples do not go higher than 104 000 samples
* most of the time, it is around 100 200 samples max
* When it crashes, it was often like 103 000 before. But I also got some 104 000 and no crash... then going back to smaller values
Then I went back to callback every 250ms (25 000 samples every call):
* no crash at all
* it is very often significantly higher than 25 000 like 33 000
* it can go up to 42000 available samples... and then go back to 27 000 then go back to 40 000 => I am then very late in handling samples.. but the framework is happy (?????!!!!!!)
- That leaves me a bit puzzled (and you are very remarkably and humbly acknowledging that your very wise recommendations, that I am fully aligned with, did not fit my observations)
It gives me the impression that there would be a kind of buffer in between the onboard buffer of 2K samples and the buffer defined by quant_samp_per_chan (which I already set like 1000x num_samples per callback). If it would be like 110 000 samples or less, that would leave me only 10 000 samples to get the callback and read the data.
That would be streaming/callback mechanism implemented "badly". But we don't know the internals of the framework (mmmh... could try some kernel traces... or contact NI directly... but not sure it is worth the effort) so I think I will keep on relying on your empirical observations, which brought me much stability
12-09-2024 03:58 AM - edited 12-09-2024 04:10 AM
Well, at 250ms every callback (25 000 samples), I can go up to 85 000 and not crash. Then go back to 25 000.
Often, you see available samples decreasing by 23000 or 24000 in that case, which points to next callback being called immediately because you are (very) late. So at last I added a timestamp and it confirms this, callbacks are queuing up as expected and you can recover from it, then go back to being late, recover.... Well, usual stuff and balance for real-time considerations
Also tried 500ms where it can still crash. It seems to point again towards some ~100 000 samples real-time limit