Find the first unique element in an array

tshurtz · ‎05-17-2024

I am looking for some better understanding of the memory and time implications of using different methods of the following.

It is less about this case in specific but more an interest in knowing what the code does in the background.

In this specific case I want to find the first unique point in a 1D array of x,y points, then rotate the array to make that the first element.

Bellow are a few ways I came up with doing that (the second array of points also needs to be shifted in the same way but doesn't effect the decision).

As a bit more context, I expect the arrays to be on the order of 100 to 10000 points long and 99.99% of the time the first entry will be unique and so no change is needed. There is next to 0 chance that you have to go even a significant way through the array but you can't be certain. It will also be called thousands of times with different arrays. the precise order of the points is not super important but the pairing to the 2nd array must be maintained (in other words if you somehow identified the first unique element and swapped it with the 0th element and did the same for the matching indexes in the array b that would be fine to). In the real code I plan to make it inline and would like to avoid any significant memory hit if possible.

A couple thoughts/questions. Does the rotate array take time or memory changes or does it just effect how the array is indexed downstream? Is it better to code in the while loop or the for loop when I suspect it will likely only run a single iteration. I assume the delete from array is a poor choice as it will create a copy of the array and have to copy it over (that being said I like that it just returns the index and you can use it however you want). Is there some other method that is significantly superior? I thought of some other methods (such as sort the array and then compare points before and after to make sure they are both different) but rejected them as I think they would be much more of a memory and time hit.

Any way, any insights or suggestions are welcome. Thanks.

altenbach · ‎05-17-2024

What's your definition of a "unique point"?

What should happen if there is no unique point?

What should happen if the array is empty?

Note that equal comparisons for DBL are tricky. Maybe a blue datatype is safer, depending in the use case.

It would probably help to have reasonable default values in the inputs so we have something to try.

Maybe it would be simpler to explain what you are trying to achieve in the big picture of things.

LabVIEW Champion.

tshurtz · ‎05-18-2024

What I mean by unique point is that it is the only element (cluster of two singles) with that exact value. It is possible that the first point could have a duplication somewhere else in the array.

So for inputs you can just use pairs of randomly generated singles. It would be highly unlikely that this would result in a duplicate value, especially at the first element, so if you want to test the case that would have that you could just append the first value to the end.

I didn't want to include 40000 values as an example and the real points are a bit more structured but that shouldn't matter significanly.

For a bit of a bigger picture, this is a very small part of some very large code that is a little like a slicer for a 3D printer. Prior to this I have taken triangles in a 3D model and intersected them with a plane to get these arrays of 2D points. The two arrays of points are the start and stop values of many line segments (starts are in array a and ends are in array b). These line segments are not particularly ordered at this point but next I find the closed loop(s) that they make. There is the possibility that a given layer may have multiple loops that share a single point in space and therefore there will be duplicate points in array a. When I form my loops, I don't want to start at one of those points, so if I start by verifying that the line segment I start on does not have a duplicate in the array I can avoid this situation.

All of that being said, my curiosity is more about labviews internal functioning in general.

Is there a functional difference between using "for" or "while" loops if we intend to stop after one or two iterations but might need them all?

Does using an array rotate cost anything significant memory or time wise (like does it shift all of the values around in the memory, or does it just modify the way it is accessed in the future).

Is the delete from array as bad as I think (maybe a split array would be better).

Are the other insights that would be helpful?

Thanks again.

paul_a_cardinale · ‎05-19-2024

Is this what you are trying to do?

tshurtz · ‎05-20-2024

@paul_a_cardinale

I thought about that but it could fail under a specific condition. If the first 2 values happen to be the same (or if all values prior to the first true unique value are duplicated), then this thinks element 1 is the first unique value despite it being a duplicate in the array. So if for example you happened to have a,b,c,c,d,b,a,e it would return c (element 3) as the first because there are no duplicates after it.

This should be highly unlikely with my real data but not impossible, and I want to be certain.

Apart from that like that you get the index as an output and that I don't think it duplicates the array.

Thanks.

paul_a_cardinale · ‎05-20-2024

@tshurtz wrote:

@paul_a_cardinale

I thought about that but it could fail under a specific condition. If the first 2 values happen to be the same (or if all values prior to the first true unique value are duplicated), then this thinks element 1 is the first unique value despite it being a duplicate in the array. So if for example you happened to have a,b,c,c,d,b,a,e it would return c (element 3) as the first because there are no duplicates after it.

This should be highly unlikely with my real data but not impossible, and I want to be certain.

Apart from that like that you get the index as an output and that I don't think it duplicates the array.

Thanks.

Oops. I was way off. Try this:

tshurtz · ‎05-20-2024

@paul_a_cardinale wrote:

@tshurtz wrote:

@paul_a_cardinale

I thought about that but it could fail under a specific condition. If the first 2 values happen to be the same (or if all values prior to the first true unique value are duplicated), then this thinks element 1 is the first unique value despite it being a duplicate in the array. So if for example you happened to have a,b,c,c,d,b,a,e it would return c (element 3) as the first because there are no duplicates after it.

This should be highly unlikely with my real data but not impossible, and I want to be certain.

Apart from that like that you get the index as an output and that I don't think it duplicates the array.

Thanks.

Oops. I was way off. Try this:

That seems like it is a robust solution as well. But it brings up more questions for me. Will the split array end up duplicating memory? are two search arrays of smaller arrays faster or the same as the one big search (in reality this will nearly aways only take 1 or 2 iterations and it will still be one large array and an empty array or a single element on the other)? It also still leaves the questions of if the rotate array will duplicate the memory or if a while loop is somehow better if you intend to stop very early.

paul_a_cardinale · ‎05-21-2024

Or this:

altenbach · ‎05-21-2024

So these SGL values are actually quantized to the steps of the 3D printer and should be integers, right?

I still don't understand the problem you are trying to solve. What would it matter is element #9000 is the same as the first?

LabVIEW Champion.

tshurtz · ‎05-28-2024

@altenbach wrote:

So these SGL values are actually quantized to the steps of the 3D printer and should be integers, right?

I still don't understand the problem you are trying to solve. What would it matter is element #9000 is the same as the first?

So it isn't actually a 3D printer, but it has many similarities. The points could be quantized to integers. I actually do this prior and convert back to the nearest singles, in part to to avoid issues you are aluding to but I will be doing interpolation between them in the next step and want to avoid converting them back and forth repeatedly. And at the end I need them output as singles.

Yes, it matters if the first matches any other array element.

All of the methods I show above work to accomplish this. I am more curious about the implications (memory, speed, of using the different methods. I don't beleive any of that changes if you use integers instead.

LabVIEW

Find the first unique element in an array

Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array

Re: Find the first unique element in an array