11-03-2019 06:05 AM
Hello!
I have implemented K-Mean clustering algorithm and I want to bifurcate the data into 5 data groups.I have attached my required.png file and my code but the code doesn,t generate the desired output every time.I am initialization my centroids withing the span of data generated and it is all random. Terminating my loop when there is no change between the previous and currently calculated centers of clusters.
11-04-2019 12:45 AM
Hello
This is the basic problem with K-means clustering, it lacks consistency and it is not repeatable.We might get different outputs eachtime.
Then why K-means clustering is popular ? - the answer is simple.It is faster and it is always an introduction for a course in unsupervised learning. Check this.
I was curious to know what was happening with the clustering method, So I have tried implementing the same, it works a bit differently and gives the desired output 8/10 times (cant give accurate figure)
At the first iteration, we have to make sure that the random centroids are taken only once.
I have attached the VIs (Please download OpenG Array toolkit if you are not using it). Try exploring different clustering methods.
-Rahul
Hit KUDOS for Thanks
11-23-2019 12:36 AM
But the results are still not consistent.....I have come up with a solution in which i iterate the whole process multiple times and check the states if they are same and then converge towards a sol....will share the code
11-23-2019 04:26 PM - edited 11-23-2019 04:26 PM
@Rahulbala wrote:
This is the basic problem with K-means clustering, it lacks consistency and it is not repeatable.We might get different outputs eachtime.
this is only true, if you use random values for initialisation.
@sets wrote:
I am initialization my centroids withing the span of data generated and it is all random.
have you tried to reproduce your results using the same initial values?
11-23-2019 06:15 PM
Traditionally,random initialisation is part of K-means clustering algorithm. Fixing the initial values will definitely give you the same result everytime. What you do mean by same initial values(like same index values whatever data is given as input) ?
We can make sure that we do a better selection of initial values by using techniques like Naive Sharding centroid algorithm. This will make sure that the initial values are good enough for clustering.
-Rahul
Hit KUDOS for Thanks
11-24-2019 06:32 AM
@Rahulbala wrote:
Fixing the initial values will definitely give you the same result everytime. What you do mean by same initial values(like same index values whatever data is given as input) ?
k-means is going to converge to a solution or rather a local minimum. always.
but the quality of this solution may differ dramatically from trial to trial, because the found local minimum must not be the optimal local minimum.
this is because not every randomly picked starting point for a centroid will converge to the actual centroid.
here, the actual centroids are given in required.png 287 KB
I don't have the TSA Toolkit, so in
I had to change
to
furthermore, you should use a For-Loop here:
so, now we can look easily on the initial value vector, and how it affects he found solution:
this instance did converge in 3 steps to a not so well solution:
this instance did converge in 3 steps to the optimal solution:
attached .vi is back-saved to LabView 2010
11-24-2019 06:58 AM - edited 11-24-2019 06:59 AM
Great but even now I am getting randomised outputs.
-Rahul
Hit KUDOS for Thanks
11-24-2019 10:01 AM
@Rahulbala wrote:
Great but even now I am getting randomised outputs.
that's not the point.
set already figured out on his own, how to cope with this conduct:
@sets wrote:
But the results are still not consistent.....I have come up with a solution in which i iterate the whole process multiple times and check the states if they are same and then converge towards a sol....will share the code
09-11-2024 01:23 PM
Removed OpenG & SubVI dependencies, added visualizations, & basic statistics.
This K-Means Clustering is for a 1D Data Set array with optional normalization (using the data set or an absolute min & max scale to output 0 to 1 or -1 to 1 respectively). The graph shows the data set clusters and the cursor shows the mean for each cluster.
VI Snippet in LabVIEW 2018.