OK, I took the liberty to stage a competition between the four code suggestions: Bob, Ben, Altenbach1 (upper code in image), Altenbach2, lower code in image.
The attached table shows some results with various array sizes and various fractions of zeroes.
Some comments:
The versions by Bob and Ben are only suitable if there are very few zeroes. Both use "delete from array", a very expensive function because for every zero found, all higher elements must be moved down one slot. Ben is faster despite the fact that he uses 2 loops and reverses the array. Bobs version can beat Ben by keeping the current index in a shift register, then start searching from there. Right now, every search starts with element zero so large portions, so e.g. the first element is searched as many times as there are zeroes in the entire array. Both depend strongly on the number of zeroes in the array. If there are no zeroes, they are very fast. They slow dramatically with the numbers of zeroes present.
My simple version (upper panel, altenbach1) is pretty fast, but around inputs of 0.5-1MB it slows down dramatically. I have not explored the details, but most likely LabVIEW "underguesstimates" the final array size and must reallocate the output during the loop. Maybe there is some other reason (e.g. cache issues, etc.).
As expected, my second version (lower panel, altenbach2) is
the only consistent performer. The times are independent of the fraction of zeroes and the execution time seems linear with input array size (up to 20MB tested!)