Splunk Search

## How can I calculate the term frequency for all the words in a field's values?

Explorer

I am trying to calculate some term frequency on the field. The field is defined as follow.
`rex field=_raw "Notes : (?.*)"`
And, the field is generated correctly, but it hasn't any format, such as:

Notes :

Notes : Troubleshooting, I am simply reinstalling.
Notes : program would not start. I am reinstalling.
Notes : Made MacBook too slow
Notes : computer to slow when using the program! Need to install it into another!

There are thousands line of information, and I want to know the term frequency of all the words in the field of notes. I'd like to know whether there is a command to do this, or how can I achieve this in splunk.

Any ideas?
Thanks, Yi

Tags (4)
1 Solution
Contributor

I think the REX from that post should get you going in the right direction. I've pasted the REX below. Please see the original for more details.

``````source=*mybook* | sort -_time | rex mode=sed "s/(\.|,|;|=|\"|'|\(|\)|\[|\]| -|!|^-)/ /g" | eval word=_raw | makemv delim=" " word | mvexpand word | eval word=lower(word) | eval position=1 | streamstats sum(position) AS position | table position word | stats count min(position) max(position) by word
``````
Path Finder

I used the following syntax to count the frequency of terms in my field:

``````              | rename COMMENTS_4 AS text
| rex mode=sed field=text "s/[,|.|!]/ /"
| makemv text
| mvexpand text
| eval wordCount = mvcount(text)
| stats sum(wordCount) as "Word Map Text Analysis" by text
``````

the line: `| rename COMMENTS_4 AS text`
just names my field variable to "text". So assuming you rename your field variable with text, you can count the terms using MV* commands

Contributor

I think the REX from that post should get you going in the right direction. I've pasted the REX below. Please see the original for more details.

``````source=*mybook* | sort -_time | rex mode=sed "s/(\.|,|;|=|\"|'|\(|\)|\[|\]| -|!|^-)/ /g" | eval word=_raw | makemv delim=" " word | mvexpand word | eval word=lower(word) | eval position=1 | streamstats sum(position) AS position | table position word | stats count min(position) max(position) by word
``````
Explorer

Thanks, jimodonald! I tried the REX. It works. But now I have another question that can I cluster some similar words to one class, such as fast, quick, rapid, swift.

Path Finder

you have to use a lexicon. Look up the nodejs library for Word Net. Upload that library. Then build a new app in splunk. Once that is done, create a .js file that calls the word net library, then define a search manager in the .js file that returns your splunk search. Loop through all the words, and pass each one to the word net library to built a temporary sysnonym dictionary. You can optionally save this dictionary as a KV store and continually update.

I know I didnt give details, but thats because it is a highly involved solution. But it is possible. Start poking around with Word Net and the capabilities.

Keep in mind, that all custom Splunk apps are basically Node.js apps - at least that is my current understanding. Community, let me know if I am wrong!

Contributor

Splunk is not going to know what words are synonyms. It could likely be done with a case statement or a lookup table. Either way the synonyms would need to be identified and linked back to a common word.

Get Updates on the Splunk Community!

#### Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

#### Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

#### Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...