Hi Everyone,
So I have data like this in my lookup table
fields
10| 2 | red
4 | 6 | red
9 | 1 | red
110| 102 | blue
104 | 106 | blue
109 | 101 | blue
So if I use the fit command
| inputlookup fitcommandexample.csv | fit KMeans k=2 "A" "B" by C
Results
A B C cluster cluster_distance
10 2 red 1 6.44444444444
4 6 red 1 22.4444444444
9 1 red 1 5.77777777778
110 102 blue 0 6.44444444444
104 106 blue 0 22.4444444444
109 101 blue 0 5.77777777778
But
| inputlookup fitcommandexample.csv | where C like "blue"| fit KMeans k=2 "A" "B"
Result
A B C cluster cluster_distance
110 102 blue 0 0.5
104 106 blue 1 0.0
109 101 blue 0 0.5
Likewise
| inputlookup fitcommandexample.csv | where C like "red"| fit KMeans k=2 "A" "B"
yields
A B C cluster cluster_distance
10 2 red 1 0.5
4 6 red 0 0.0
9 1 red 1 0.5
So what I was hoping for was that the by clause would make the fit command fit to each of the subsets red and blue in isolation such that the result yielded
| inputlookup fitcommandexample.csv | fit KMeans k=2 "A" "B" by C
A B C cluster cluster_distance
10 2 red 1 0.5
4 6 red 0 0.0
9 1 red 1 0.5
110 102 blue 0 0.5
104 106 blue 1 0.0
109 101 blue 0 0.5
blue and red were essentially separate clusters other wise I am not sure how to quickly break up the data and apply fit to the subsets without writing and external script via API. Any ideas?
Thanks
Tim
the map command might be your only option, as there isn't a by
command for clustering.
|makeresults |eval data="C=red C=blue"|makemv data|mvexpand data|rename data as _raw|kv|table C
|map maxsearches=6 search="|makeresults |eval data=\"A=10,B=2,C=red A=4,B=6,C=red A=9,B=1,C=red A=110,B=102,C=blue A=104,B=106,C=blue A=109,B=101,C=blue\"|makemv data|mvexpand data|rename data as _raw|kv|search C=$C$|table A B C|fit KMeans k=2 A B"