06/22/16client 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

pgadhari · ‎09-03-2016

Hi All,

I am splitting a Description field with "space" using Split command and generating list of keywords ( doing sort of text analytics) and doing the stats count by keywords to find top count of keywords in the data. The problem is, there is an "order" keyword which is coming twice in the Description, but I want to take the unique count of the keyword or remove the duplicate so that I can get the proper count of that "keyword" presence in that specific ticket Description field. Please help me how can I do that ? Following is my query, INC00001 is specific ticket number:

Description field has this text --> Order #1111111 Order date #06/25/13 client needs to process return but the invoice hasn't dropped yet

index=abc source="abc.csv" INC0001 | eval words=split(Description," ")| stats count by words

Getting following output from this query :

Order 2
but 1
date 1
dropped 1
hasn't 1
invoice 1
needs 1
process 1
return 1
the 1
to 1
yet 1

I want the query, that should count Order as one event instead of 2, as it is part of same ticket Description. I tried doing dedup but that is not working. I have to do this for ~15K ticket events for whatever top keyword counts I will be getting. But first I am trying to get it worked for single ticket event. Please help, this is quite urgent.

Regards
Pankaj

sundareshr · ‎09-04-2016

Try this

index=abc source="abc.csv" INC0001 | makemv Description delim=" " | mvexpand Description | stats dc(Description)  as count  by Description

pgadhari · ‎09-04-2016

Hi,

I tried that command, but this is good for some number of incidents. In my data I have 25000 events (tickets) and I want to run this for all those events, it is getting stuck saying memory error. Also, after that I have to count the keywords and sort it by count so that i can show the maximum number of occurrences of those keywords (sort of text analytics in my data) i.e. i have to count for common keywords and want to show top 50 keywords with their count. Whether we can do that by adding some search to this query ?

*command.mvexpand: output will be truncated at 134000 results due to excessive memory usage. Memory threshold of 500MB has been reached *

Regards
PG

richgalloway · ‎09-03-2016

Try ... | stats distinct_count by words.

---
If this reply helps you, Karma would be appreciated.

pgadhari · ‎09-03-2016

I used distinct_count by not getting the answer in the way i want. It is returning the table for all the fields in the data. Attaching the screenshot of the same. I want the answer in the simple way as below, so that I can sort on count for all the ticket data and show the top occurrences of the keywords. Somehow i cannot attach the screenshot due to less karma points.

words distinct_count(Assignment group) distinct_count(Category) distinct_count(Description) distinct_count(Location) distinct_count(Month _ Year) distinct_count(Name) distinct_count(Number) distinct_count(Open_Date_Time) distinct_count(Priority) distinct_count(SRT Held) distinct_count(Short description) distinct_count(Status) distinct_count(Task Type) distinct_count(asset_id_1) distinct_count(current_ticket_state) distinct_count(date_hour) distinct_count(date_mday) distinct_count(date_minute) distinct_count(date_month) distinct_count(date_wday) distinct_count(date_year) distinct_count(date_zone) distinct_count(eventtype) distinct_count(host) distinct_count(index) distinct_count(linecount) distinct_count(owner_name_1) distinct_count(punct) distinct_count(source) distinct_count(sourcetype) distinct_count(splunk_server) distinct_count(splunk_server_group) distinct_count(system_user_1) distinct_count(tag) distinct_count(tag::eventtype) distinct_count(time_submitted)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

06/22/16client 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

1032844299 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

Order 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
but 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
date 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
dropped 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
hasn't 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
invoice 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
needs 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
process 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
return 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
the 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1
to 1 1 1 1 1 1 1 1

I want in below format :

words count
Order 1
the 1
date 1

richgalloway · ‎09-04-2016

... | stats distinct_count(Description) by words should do it.

---
If this reply helps you, Karma would be appreciated.

pgadhari · ‎09-04-2016

Hi Rich

Thanks for that, it is working. Now, actually I have got the dc of words and I am counting the total occurrence of that word. Now, I have to filter the words that are not relevant like "the,is,for,that"... etc...

I am putting :

index=abc source="abc.csv" |eval words=split(Description," ") | stats distinct_count(Description) as count by words | sort - count | search words!="to" AND words!="is" AND words!="" AND words!="in" AND words!="the" AND words!="not" ....

words count
for 3565
and 3294
SAP 3178
on 2897
a 2546
of 2327

I am using search words!="keyword" to remove the keywords not wanted, but this will create a long list of the keywords and for me also it will take more time. Is there any way that I use multiple keywords in search without using != & AND operators by specifying them in the search command syntax or you can suggest some way for doing it faster? I have lot of keywords being returned from 25000 events :-). Please advise.

Regards
PG

richgalloway · ‎09-04-2016

A slightly easier construct is ... | search NOT (words="to" OR words="is" OR words="in" OR words="the"...) | ..., but that's probably not the kind of shortcut you're looking for. If I were doing this, I'd look try a lookup table of words to ignore. I'll have to think about it a while before offering a specific suggestion, however.

---
If this reply helps you, Karma would be appreciated.

pgadhari · ‎09-04-2016

Hi Rich,

I got it. Actually, I have created a list of keywords now that needs to be searched, there are some 200 to 250 words. I want to put it in lookup table and search the occurrence of each keyword in each event and need to find out the total count and percentage occurring in the raw event. Also I want to make sure there should not be duplicates while searching.. I mean if "order" word is coming twice or thrice in one event, I should count it as disctinct one.

The only challenge I am getting is comparing each keyword from the lookup file and looking for it in the raw events and showing the count and percentage. can you help me in that query please.

Regards
PG

Remove duplicate keywords (values) returned from field

06/22/16client 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

1032844299 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation

Remove duplicate keywords (values) returned from field

06/22/16client 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

1032844299 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience