So we think we've tracked down the issue. Hopefully this can help someone else in the future.
In processing the data we have coming in, we have a few places where we use for loops to do things to the data. Sometime in the past, these were changed to take advantage of parallelized for loops. Seemed like a good idea at the time. However, our guess is that when you have a standard tunnel into a for loop (i.e. non indexing), LV makes buffer copies for each parrallel execution of the loop. It does not seem that LV handles that memory very well. Once we removed the parallelized for loops, the memory greatly stabilized.
Thank you to everyone for the suggestions.