Hi all,
I have some questions about the k-means clustering, and would be good to get some confirmation on this. What happened was I have trained my model with sample data and it clustered my data into different clusters, e.g. Cluster 1, Cluster 2, ...., Cluster k, that's good. But then if I would to use the same sets of data (e.g. same time range and SPL) to "apply" the model which has just been trained, the Cluster label which I got as a result from that "apply" didn't seem to match (to the trained model) based on the statistics.
How I checked the statistics was I did | outputlookup to two different files (1 from the fit command, and 1 for apply command), and did a ... | stats count by cluster. For example,
Outputlookup A (from fit command)
cluster count
1 8000
2 23
3 55
Outputlookup B (from apply command)
cluster count
1 55
2 8000
3 23
My question is if the behavior of "random" cluster labeling from apply is expected or it should have been stick to the same label from the trained model. I'm thinking it makes more sense in the latter. If someone could confirm this then it would be great!
Thank you
... View more