Splunk Search

Count and display the TOP10 words occurrences from an event?

sousouheyl
Engager

Hello everyone,

I'm trying to count every occurrences words from all events and get a TOP 10.

Each sentences is an event:

Cisco Products IPv6 Neighbor Discovery Crafted Packet Denial Service Vulnerability
Cisco Application Policy Infrastructure Controller Binary Files Privilege Escalation Vulnerability
Cisco Aironet 3800 Series Access Point Platforms ARP Request Handling Denial Service Vulnerability
Cisco IP Phone 8800 Series Web Application Buffer Overflow Vulnerability
Cisco IOS XR Software LPTS Denial Service Vulnerability
Cisco Aironet Access Points Command-Line Interpreter Linux Shell Command Injection Vulnerability
Cisco WebEx Meeting Center Site Access Control User Account Enumeration Vulnerability
Cisco Prime Infrastructure Evolved Programmable Network Manager Remote Code Execution Vulnerability
Cisco IP 8800 Series Phones btcli Utility Command Injection Vulnerability
Cisco Prime Network Analysis Module Unauthenticated Remote Code Execution Vulnerability
Cisco Prime Network Analysis Module IPv6 Denial Service Vulnerability
Cisco Prime Network Analysis Module Authenticated Remote Code Execution Vulnerability
Cisco Prime Network Analysis Module Local Command Injection Vulnerability
Multiple Vulnerabilities OpenSSL Affecting Cisco Products: May 2016
Cisco ESA WSA AMP ClamAV Denial Service Vulnerability
Cisco Firepower Management Center Web Interface Code Injection Vulnerability
Cisco UCS Invicta Software Default GPG Key Vulnerability
Cisco Prime Infrastructure Cisco Evolved Programmable Network Manager JSON Privilege Escalation Vulnerability
Multiple Vulnerabilities OpenSSL Affecting Cisco Products: March 2016

For exemple here the result will be:

Cisco
Vulnerability
Prime
Network
Denial
Service
Analysis
Code
Injection
Module

Thanks for your help,

Best regards,

0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

Here's a run anywhere example using some of your data:

| gentimes start=6/6/2016 end=6/7/2016 
| eval myField="Cisco Products IPv6 Neighbor Discovery Crafted Packet Denial Service Vulnerability Cisco Application Policy Infrastructure Controller Binary Files Privilege Escalation Vulnerability Cisco Aironet 3800 Series Access Point Platforms ARP Request Handling Denial Service Vulnerability" 
| makemv delim=" " allowempty=true myField 
| mvexpand myField
| stats count by myField | sort - count

What you'll need is the last three lines, but I"ll explain all just so you know where they're coming from.

gentimes just create a "fake" event (one of them).
I create a myField which is set to a random set of your event's contents. This technique will work on your separate events, too.

Then, for the important bits.

makemv changes your single string of "stuff" (Cisco Products IPv6...) into a multi-valued field, splitting on a space. If you only run the search to that point you'll see what it does.

Then we use mvexpand to make each of those multi-valued fields into a separate event.

Then it's easy - just use stats to get a count by that field and sort them. You specifically wanted TOP, so you can do that too by using | top limit=10 myField instead of the | stats count by myField | sort - count I used.

View solution in original post

Richfez
SplunkTrust
SplunkTrust

Here's a run anywhere example using some of your data:

| gentimes start=6/6/2016 end=6/7/2016 
| eval myField="Cisco Products IPv6 Neighbor Discovery Crafted Packet Denial Service Vulnerability Cisco Application Policy Infrastructure Controller Binary Files Privilege Escalation Vulnerability Cisco Aironet 3800 Series Access Point Platforms ARP Request Handling Denial Service Vulnerability" 
| makemv delim=" " allowempty=true myField 
| mvexpand myField
| stats count by myField | sort - count

What you'll need is the last three lines, but I"ll explain all just so you know where they're coming from.

gentimes just create a "fake" event (one of them).
I create a myField which is set to a random set of your event's contents. This technique will work on your separate events, too.

Then, for the important bits.

makemv changes your single string of "stuff" (Cisco Products IPv6...) into a multi-valued field, splitting on a space. If you only run the search to that point you'll see what it does.

Then we use mvexpand to make each of those multi-valued fields into a separate event.

Then it's easy - just use stats to get a count by that field and sort them. You specifically wanted TOP, so you can do that too by using | top limit=10 myField instead of the | stats count by myField | sort - count I used.

sousouheyl
Engager

Thank's for you fast answer.

If i understand, only the last three lines are what I need ? Or I need to run "Gentimes" and "eval" ?

source="/Users/NS/Downloads/mkm" sourcetype="test" gentimes start=12/6/2016 end=13/6/2016
| eval myField="Cisco Products IPv6 Neighbor Discovery Crafted Packet Denial Service Vulnerability Cisco Application Policy Infrastructure Controller Binary Files Privilege Escalation Vulnerability Cisco Aironet 3800 Series Access Point Platforms ARP Request Handling Denial Service Vulnerability"
| makemv delim=" " allowempty=true myField
| mvexpand myField
| stats count by myField | sort - count

That's the command that I run into my Splunk.

It display me " No Results Found ".

But let me ask you a question, when you define " eval myField = " Cisco .... " " , It will take only the string into the brackets or all the events?

Thx a lot,

0 Karma

sousouheyl
Engager

That's work !!!

I used this method:

sourcetype="test"
| eval myField=_raw
| makemv delim=" " myField
| mvexpand myField
| top limit=10 myField

It display me exactly what I want !

So for this part it's perfect.

Just another little question:

I want to monitor continuously a text file , the file containing 100 lines

Cisco ...
Microsoft ..
Symantec
Azerty
ERTY
..

So I went to
Settings -> Data Inputs -> Files and directories -> New -> I choosed the path -> Continuously monitor ->
When I'm in source type -> Trigger events : Each line -> Time stamp : Actual Hour

And now I click on Save As :

Nom : azerty
Description :
Catégory : Customize
App : Search & Reporting

After that, I click on next:

App context : Search and Reporting
Host field value - Constant value : NS
Index : default

So, when I finish everything , there is a " No results found " when I run one of this 2 commands:
source="/Users/NS/Downloads/allo" host="NS" sourcetype="TEST"
or
sourcetype="TEST"

My file wasn't indexed and I didn't find the sourcetype=TEST

I did it many times and successfully with structured files, it's the first time that I meet this problem.

In my opinion I think Splunk cannot monitor continuously because it's only full text, there are'nt timestamp or fields or any other indications although I choosed the timestamp of my system.

So I tried to do exactly the same configuration but Index only one time and not monitor continuously and .... that's work. Data Inputs --> Index Once --> same configuration that previously

My need actually is to monitor it continuously because if modify anything in the text file, I will'nt see the change in Splunk, and that's the problem. It is possible for a full text file ?

Have you any idea?

Thanks,

0 Karma

Richfez
SplunkTrust
SplunkTrust

Only the last three items. Guessing from what you modified the above to it could be something like this.

source="/Users/NS/Downloads/mkm" sourcetype="test" 
| makemv delim=" " myField
| mvexpand myField
| top limit=10 myField

But that's very unlikely to work - you'll have to use your fieldname that contains the lines you provided as examples. You can find the fieldnames by searching your base search source="/Users/NS/Downloads/mkm" sourcetype="test" in Verbose Mode (upper right, just below the time-selector) and looking down the left side. If you can find one ("Message" perhaps?) that has that content, you can use that field everywhere in the above search that I have "myField".

If you don't have a field listed with just those contents, you can TRY - though it won't probably work right but is easy so might be worth a shot just in case it does work fine - copying _raw (which is the "raw" event text) into a new fieldname called myField then using it as above. That would be like this:

source="/Users/NS/Downloads/mkm" sourcetype="test" 
| eval myField=_raw
| makemv delim=" " myField
| mvexpand myField
| top limit=10 myField

But as I said, that's unlikely to work and might include dates and times and other information in your collection. If it does work, well, Hooray!

But what we will likely need to do is create a new field with those contents and those only. This isn't too hard, but it would help immensely if you would run the search source="/Users/NS/Downloads/mkm" sourcetype="test" all by itself and paste a couple of those lines into here (remember to use the code 10101 button in the toolbar of this editor). Once I have that, we can probably make a new field and use that in our work.

Good luck, just hang in there and we'll get you all working soon!

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...