We are trying to cluster a Description field with cluster command in this way:
| cluster t=0.5 labelonly=t showcount=t field=Description match=termset | table cluster_label cluster_count Description
Do you know if is it feasible to know which are the most common words or string, on each cluster, the algoithm has used on the Dataset to generate the clusterization?
I would like to add one column on the right to my output table containing the patterns that have generate this cluster (that are linked to Description field).
cluster_count | cluster_label | Description | words 120 | 1 | Bla bla bla ciao ciao | bla, ciao 80 | 2 | Day after day is better | Day
Thanks a lot,
Did you tried TFIDF algorithm?
Additionally, if you are interested in Text Analytics usecase, I would recommend to look into NLP Text Analytics app which is using MLTK Algorithms: https://splunkbase.splunk.com/app/4066/#/details
Let me know if it helps.
hi, could you please explain how the findkeywords command works? I couldn't find it anywhere on the Splunk documents.