Hi All,
Just want to know if there are built-in apps in Splunk that can analyze text or strings and give me the most used words or phrases in a field. I have this field short_description which contains the description of the ticket. I tried to use stats count by short_description and used the word cloud viz but it treats the string as one and there are too many values for the short description of the tickets.
Is there a way that we can get the most used texts or phrases from that field and display them like in the word cloud viz?
Thanks in advance.
All the best,
Nicolo
there are a few routes to take.
This is a sentiment analytics app. It uses naive bayes to train your own data from the CLI.
https://splunkbase.splunk.com/app/1179/
This is the machine learning toolkit, which comes with a lot of algorithms, including the TFIDF for feature extraction on text fields, allowing other algorithms to be used on terms for analysis.
https://splunkbase.splunk.com/app/2890/
If you can limit yourself to words rather than phrases then this bit of code should work:
index="myIndex" | makemv myField | mvexpand myField | stats count by myField
It assumes that words are space separated, if you have any other separtor just tweak makemv command.
What happens here is makemv splits normal text field into multi value field, mvexpand "flattens" it (puts each value as a separate event) and stats part just makes stats magic 🙂
Try to use Splunk Machine Learning Toolkit App especially built-in TFIDF(term frequency–inverse document frequency) numerical statistic
after that you can use word cloud viz.