All Apps and Add-ons

How to count number of times words occur in a field in Splunk?

umichguy
Explorer

I have a search in the form of:

index=mail sourcetype=a_mail | stats count by subject | sort -count 

This displays the subject lines of all emails in the past, let's say, 1 week. The subject lines are like:

line 1: aaa bbb ccc ddd
line 2: xxx aaa bbb yyy
line 3: aaa xxx rrr ggg

I wish to count the number of times aaa occurs in all of the displayed subject lines, and the number of times bbb occurs etc. Please note that I'm not specifically looking for any particular word. I just want to count the different words and display them in order of decreasing frequency.

So far I have tried to remove spaces from the subject lines:

index=mail sourcetype=a_mail | stats count by subject | sort -count | rex mode=sed field=subject "s/ //g"

..and substituting the spaces with a delimiter like 'comma' instead. They storing them in 'kv' stores(?) and then counting the repetitions of words, but it's not working since I do not know how to implement kv stores yet. Any ideas are appreciated.

0 Karma

Raghav2384
Motivator

Pardon me for literally...i mean literally translating your example in to actual data 🙂

|gentimes start=-1|eval text="line 1: aaa bbb ccc ddd line 2: xxx aaa bbb yyy line 3: aaa xxx rrr ggg"| eval Counter = lower(replace(text, "\W+", " "))
|makemv Counter
|mvexpand Counter|stats sum(eval(Counter="aaa")) as aaa,sum(eval(Counter="bbb")) as bbb , sum(eval(Counter="ccc")) as ccc

Hope this gives you some ideas. @martin_mueller has answered these type of questions several times...copied one of his answers.

Thanks,
Raghav

0 Karma

Raghav2384
Motivator

How about something like this?

|gentimes start=-1|eval text="line 1: aaa bbb ccc ddd line 2: xxx aaa bbb yyy line 3: aaa xxx rrr ggg"|makemv text|mvexpand text|eval Counter=mvcount(text)|stats sum(Counter) as Counter by text
0 Karma

umichguy
Explorer

I looked at such answers before. However, my problem is I can't hardcode words into my search query such as eval(Counter="aaa")) because I do not know what words are going to be present and also there are 1000s of words. I need something more dynamic that can discover and then count words on its own.

0 Karma

Raghav2384
Motivator

That's what i thought

0 Karma
Get Updates on the Splunk Community!

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

From Idea to Implementation: Why Splunk Built mTLS into Splunk Enterprise 10.0  mTLS wasn’t just a checkbox ...