LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
Nate_Moehring

new primitive to deallocate unused memory from queues

Status: New

Deallocate Queue Memory.PNG

 

Many people do not realize that memory allocated by a queue is never deallocated until the queue is destroyed or the call-chain that created the queue stops running.  This is problematic for queues that are opened at the beginning of the application and used throughout because all of the queues will always retain their maximum size, causing the application to potentially hold a lot of memory that is currently unused or seldomly used.

 

Consider a consumer that occassionally lags behind and the size of a queue will grow tremendously.  Then the consumer picks back up and services the elements out of the queue in a short period of time.  It is unlikely the queue will be this large again for quite some time, but unfortunately no memory will be deallocated.

 

I'd like a primitive that will deallocate all of that memory down to just the current number of elements in the queue.  Since the queue won't need that much memory again for a long time and the queue will auto-grow again as needed, I'd like to recover that memory now instead of waiting for the application to be restarted (which is currently the only time the queue is created.)

 

The alternative is to add some code to periodically force destroy the queue and have the consumer gracefully handle the error and open a new reference.  Then replicate this change for all queues.  Seems messy and puts too much responsibility on the consumer.  I'd rather just periodically call a 'deallocate queue memory' primitive on the queue reference within the producer, perhaps once every few minutes, just to be sure none of the queues are needlessly holding a large amount of memory.

 

I believe this will:

  • Improve performance in other areas of the application because less time will be spent looking for available memory to allocate.
  • Reduce the chance of Out of Memory errors because large blocks of memory will not be held [much] longer than they are needed.
  • Improve the common user opinion that LabVIEW applications are memory hogs or leaky.

I realize this will hurt enqueue performance when the queue begins to grow quickly again, but this area is not a bottleneck for my application.

 

Thanks!

Nate Moehring

SDG
12 Comments
G-Money
NI Employee (retired)

You should only allocate as much memory is as needed. If you find that you are not using all the memory then reduce the size of the queue. If you switch to dynamically allocating memory during run time instead of creating one contiguous block of memory before run time, you will see a large performance decrease. I would be against adding this primative and would instead point people to proper memory management in LabVIEW (http://zone.ni.com/reference/en-XX/help/371361F-01/lvconcepts/memory_management_for_large_data_sets/). 

Nate_Moehring
Active Participant

I'm well aware of those techniques.  I am not preallocating any memory.  My application is on a desktop, not RT.  I need variable sized queues, not fixed size queues (fixed size queues don't preallocate memory either on desktops).  And I'm not stuffing them with fill data and flushing the data to "preallocate" them.  Nominally these queues have very few elements.

 

I don't mind that the queues occassionally take a little extra time to allocate additional memory to grow to accomodate the increased number of elements being put into the queue, I think that's great.  My request is that I have a way to force them to shrink again so they don't continue to use a large amount of memory even when they are have very few elements in them.

Nate Moehring

SDG
David S.
NI Employee (retired)

Well spoken, Nate. I'm going to tentatively kudo this.

David Staab, CLA
Staff Systems Engineer
National Instruments
AristosQueue (NI)
NI Employee (retired)

G-Money -- those approaches won't fix Nate's issue. He needs a solution for the "rarely fast producer, always slow consumer" problem.

 

Nate -- it's a worthy suggestion. I'll kudos it.

AristosQueue (NI)
NI Employee (retired)

I started thinking about this... a question... Let's call this new primitive "Reduce Queue Buffer" or RQB for short.

 

Where would you put this primitive?

 

Some piece of code somewhere needs to periodically check the size of the queue (using Queue Status). That seems to need to be in the Consumer loop. It would need to keep a running average of the queue size and the largest the queue had ever been. If at some point, that code notices that the current size is substantially below the largest size, it would trigger this primitive.

 

Problems: Between the "Queue Status" and the RQB call, the producer loop could have shoved more data into the queue. And even if it didn't, the Queue Status and the RQB are two additional locks on the queue above and beyond the one lock that the Consumer loop already needs when it calls Dequeue. And you're going to need to have all that "running average and max queue size" logic every time you use this primitive.

 

Thoughts: What if the new primitive was more sophisticated than just "reduce to current size". What if it was "Reduce to current size if current is significantly below running average"?

 

Problems: It would need the same "with memory" options that the Advance Notifier Wait nodes have, to handle swapping which queue refnum is currently being handled.

 

More thoughts later... going to dinner now...

AristosQueue (NI)
NI Employee (retired)

Further characterization of the original problem:  The situation where this primitive would be useful is slightly more specialized than I characterized earlier (though Nate did characterize it right in his original post). You need a "producer that is rarely faster than a consumer", true, but when the burst is over, you ultimately need the consumer to be faster than the producer. Why? Because if it only gets down to "about the same speed", then the consumer never catches up, and you need the entire buffer as allocated. Also, if your bursts are frequent and generally the same size, you haven't really saved anything.

 

So, to make this work, let's assume that you have a producer that runs at speed X most of the time but sometimes runs at anywhere between 10X and 100X, and that you have a consumer that always runs at X+N.

 

Idea variation: Instead of adding a new primitive, what about a new option on "Obtain Queue" to create a "bursty queue"? That would make a new queue that knows to track its own internal measurements and automatically reduce the queue buffer size during a Dequeue operation if the Dequeue made the element count drop back to the running average? That way the accounting (running average, current count, local maximum) are all taken care of automatically, you don't have additional lock overhead (though, of course, you do hold the one lock for a bit longer) or the race condition (producer inserting more between Queue Status and RQB). It's not as flexible a solution as the RQB solution, but it does seem to more directly address the problem you're aiming to solve. 

 

Another variant: We could create the basic RQB primitive, and then we could also have a more complex VI in the palettes that wraps up all the accounting functionality in uninit shift registers.

 

Side note: I don't want to imply that the race condition or the "multiple locks performance hit" are major issues. They're issues I can see off the top of my head. I don't know how big a deal they are. They may be red herrings in the greater scheme of things.

AristosQueue (NI)
NI Employee (retired)

You could write code to call RQB based on a particular value being Dequeue'd. In other words, when the producer bursts, it would enqueue some sentinel value that tells the consumer "when you see this value, the burst is over, so call the RQB primitive."

 

That might be a better usage pattern than the running average accounting I described earlier.

 

Nate: How would you use that primitive?

AristosQueue (NI)
NI Employee (retired)

Another variant: I got to thinking... putting the onus on the Consumer loop is kind of sucky, since it's the Producer that knows when the burst occurs. Suppose we added "Record Start of Burst", which would cause the queue to remember its current number of elements, and "Record End of Burst", which would cause the queue to be flagged such that when the Consumer called Dequeue, if the number of elements was back to the same (or lower) than the value recorded at Start of Burst, the queue buffer would deallocate back to the size recorded at Start.  You need both primitives -- start and end -- to avoid a race condition. If you only had "Start" and then said "The next dequeue that takes me to the same or lower should do the deallocate", you might call Start and the consumer might do Dequeue immediately before the burst ever got going. I think you also need End to help clarify "what happens when Start is called twice in a row" situation. Maybe.

 

Anyway, that's enough alternatives for tonight. See if you can come up with ways you'd want to actually use the RQB prim, and comment on any/all of the variants I've thrown together.

fabric
Active Participant

@AQ: Just found a couple of hours to read though all your responses Smiley Wink... The most interesting option to me is the idea of a "bursty queue". I like this idea!

 

It seems that a common usage scenario could be:

  1. Create a fixed size queue (just as you currently would) - Lets call this the "base size".
  2. Have some way to specify a "burst size". It may suffice to specify the burst size during the Obtain Queue call, but I think it would turn out to be a much more powerful tool if you could redefine the burst size on the fly (e.g. via a Queue Properties primitive).
  3. Provide a mechanism to clear all unused/available memory beyond the base size. A possible way to achieve this might be to redefine the burst size to zero (thereby saving additional primitives), or alternatively by providing a "deallocate" function as per Nate's original idea that would recover as much memory as possible in that instant.

I personally like the idea of being able to force the deallocation more than the "queue that knows to track its own internal measurements and automatically reduce the queue buffer size" idea, although I could live with automatic deallocation if it didn't cost too much in the way of queue performance. Maybe "automatically deallocate" could be an option on the Queue Properties primitive when the buffer size is specified...

 

Looking forward to seeing where this one goes!

Nate_Moehring
Active Participant

AQ, sorry for my delay in responding to your ideas.  Thank you for taking the time to think through the problem and possible solutions.  I will try to address your ideas in the order they were received.  🙂

 

1.  I don't really like putting the burden of queue management on the consumer for two reasons.  First, I tend to think of the producer as being the owner of the queue, since it was likely the one that created the queue.  (I'm primarily thinking about queues that transfer data from point A to point B, not queues that are used for IPC.  In the case of IPC, it makes more sense that the consumer owns the queue.)  Second, as you stated, it's the producer that knows how much data is going into the queue.  The consumer is kind of blind and ideally can remain a "dumb client" of the queue.

 

2.  I'm not crazy about the idea of adding much or any overhead to queues.  It's not uncommon for my application to have 50,000 queues in use at once.  Even a small amount of performance hit may have a big impact on our application.  So to avoid any overhead, the addition of a new simple primitive that simply releases unused memory that I can call when I deem necessary seems like a good no-frills solution.  To avoid a performance hit caused by calling this primitive when there really isn't very much memory that can be freed, perhaps the primitive could also take a threshold argument.  For example, "if the queue is > 75% of it's current capacity, don't free any memory.  Otherwise, free all the available memory."   It could also take a second parameter that says how much margin to retain.  "If queue is <= 75% of it's current capacity free available memory but leave 10% margin."  These two options could greatly reduce the performance hit caused by the memory deallocation (and avoid immediate reallocation on the next enqueue) but still avoids any overhead added by "queue accounting" and wouldn't lock the queue except for when the primitive is called.

 

3.  You made a good point that the issue is the producer / consumer speed ratio.  As the ratio increases, whether that's a faster producer or a slower consumer, more memory is consumed.  But when the ratio becomes very small, no memory is freed (currently).  Yes, you always want the consumer to be substantially faster than the producer if possible, otherwise you have a potential memory problem.

 

4.  The bursty queues sound nice; automatic memory management of the queues.  I like this option in theory but I'm a little worried about the overhead.

 

5.  Without adding extra logic to the producer, the producer doesn't necessarily know when it's bursting.  For example, some of my producers are nothing more than TCP/IP clients.  They read frames from TCP/IP, then enqueue them to be processed.  The TCP/IP clients run as fast as it can to make sure data isn't dropped and the TCP/IP buffers doesn't overflow.  So in reality the producers aren't really producing the data, they're just one link in the chain.  I could add logic to quantify how fast it's enqueueing data, and therefore label it as "bursting" or "not bursting", but in reality the data rate is extremely variable and out of our control.  And whether or not it's "bursting" also depends on the speed the consumer is executing, if it can keep up then there's no need to call it bursting.  Point is, I'm not a fan of trying to quantify the density of traffic to determine how the queue should behave, or when to call Start/End primitives, or when to enqueue sentinel values.  I'm also not a fan of using sentinel values, just doesn't feel like an elegent solution.  But I know from experience that sometimes they are a good solution, all things considered.

 

6.  Defining a queue with a "base size" as fabric suggests is also a nice touch, but since I don't preallocate queues I would always set this to 0.

 

Thanks for your consideration,

Nate

Nate Moehring

SDG