Solved: How to count lookup matches by the field values in...

Glasses · ‎02-12-2020

Hi,
I was given a request to use csv lists (i.e. lookups) with keyword values to find USB writes in an index where a field name of "file-name" is file info written to usb. The file-name values are not consistent and most often the value returns as a file path, like "D:/Downloads/foo/bar/foo-bar.txt" or something like that. So file-name is actually a file path.

I was asked to use a csv supplied to me as lookup criteria, like this...

keyword.csv is the lookup name

keyword      keyword-ID
*red*           34948-kjas
*green*       89050-kjec
*blue*         89008-nkme

the column header fields are "keyword" (which is a wildcard string) and "keyword-ID" (which is a rando ID)

I wrote a query like this...

index=foo sourcetype=bar [|inputlookup keyword.csv |fields keyword | rename keyword as file-name] |stats count by file-name

and I get the counts of each unique file-name which is what I thought the requestor wanted but that is not the case.

They want to know the count by keyword, like red = 5 green = 1 and blue =3 etc...

So I am stuck getting the results from my query piped back into a lookup to count by the key words...

I am not sure how I get this done. I was advised in slack to use wildcard matching to reverse the lookup but I could not get it to work.

index=foo sourcetype=bar [|inputlookup keyword.csv |fields keyword | rename keyword as file-name] | >>>> ?

Any advice appreciated!!

UPDATE

So after a few iterations I came up with this, but don't know if this is the best way to do it....

from here...

index=foo sourcetype=bar [|inputlookup keyword.csv |fields keyword | rename keyword as file-name] | >>>> ?

I verified the lookup definition for keyword.csv is WILDCARD(keyword) under advanced settings as well as all the keywords are * keyword *...

index=foo sourcetype=bar [|inputlookup keyword.csv |fields keyword | rename keyword as file-name] | rename file-name as keyword | lookup keyword.csv keyword OUTPUT keyword as Matched |stats count by Matched

now I get for results...

Matched Count

*red*     10
*green*    5
*blue*      2

These are the results they want.

anyone know how to improve this?

Thank you

dmarling · ‎02-13-2020

Hi @Glasses

I believe I have the answer to your question. The easiest way to do this is to use a lookup definition with a wildcard lookup since you already have asterisks surrounding your keywords in the lookup file. I wrote a run anywhere example to demonstrate how to do this. First I generated a keyword.csv lookup file using your sample data:

| makeresults count=1
| fields - _time
| eval data="*red*,34948-kjas
*green*,89050-kjec
*blue*,89008-nkme"
| rex field=data max_match=0 "(?<data>[^\n]+)"
| mvexpand data
| rex field=data "(?<keyword>[^\,]+),(?<keyword_ID>[^\e]+)"
| rename keyword_ID as keyword-ID
| fields - data
| outputlookup keyword.csv

I then created a wildcard lookup definition and titled it wildcardKeywords:

I updated the permissions so others could use it too, but that may not be necessary. Once you have that lookup definition you will need to add that to your query with the below syntax using your example from the question:

[| inputlookup keyword.csv 
    | fields keyword 
    | rename keyword as file-name] index=foo sourcetype=bar 
| lookup wildcardKeywords keyword as "file-name" output keyword as Matched
| eval Matched=trim(Matched, "*")
| stats count by Matched

I have also created a run anywhere example that uses the example lookup that I created earlier:

| makeresults count=1000
| eval random=round(random() % 5,0)
| eval file_path=case(random=0, "/foo/bar/blue-foo.log", random=1, "/bar/foo/red/blue-bar.log", random=2, "/foo/bar/green/red-foo.log", random=3, "/bar/foo/green.log", random=4, "/foo/bar/red.log", random=5, "/foo/bar/foobar.log")
| lookup wildcardKeywords keyword as file_path output keyword as matched
| eval matched=trim(matched, "*")
| stats count values(file_path) as examples by matched

This will produce the below results. The counts will be different since I used random to generate data:

If this comment/answer was helpful, please up vote it. Thank you.

View solution in original post

dmarling · ‎02-13-2020

Hi @Glasses

I believe I have the answer to your question. The easiest way to do this is to use a lookup definition with a wildcard lookup since you already have asterisks surrounding your keywords in the lookup file. I wrote a run anywhere example to demonstrate how to do this. First I generated a keyword.csv lookup file using your sample data:

| makeresults count=1
| fields - _time
| eval data="*red*,34948-kjas
*green*,89050-kjec
*blue*,89008-nkme"
| rex field=data max_match=0 "(?<data>[^\n]+)"
| mvexpand data
| rex field=data "(?<keyword>[^\,]+),(?<keyword_ID>[^\e]+)"
| rename keyword_ID as keyword-ID
| fields - data
| outputlookup keyword.csv

I then created a wildcard lookup definition and titled it wildcardKeywords:

I updated the permissions so others could use it too, but that may not be necessary. Once you have that lookup definition you will need to add that to your query with the below syntax using your example from the question:

[| inputlookup keyword.csv 
    | fields keyword 
    | rename keyword as file-name] index=foo sourcetype=bar 
| lookup wildcardKeywords keyword as "file-name" output keyword as Matched
| eval Matched=trim(Matched, "*")
| stats count by Matched

I have also created a run anywhere example that uses the example lookup that I created earlier:

| makeresults count=1000
| eval random=round(random() % 5,0)
| eval file_path=case(random=0, "/foo/bar/blue-foo.log", random=1, "/bar/foo/red/blue-bar.log", random=2, "/foo/bar/green/red-foo.log", random=3, "/bar/foo/green.log", random=4, "/foo/bar/red.log", random=5, "/foo/bar/foobar.log")
| lookup wildcardKeywords keyword as file_path output keyword as matched
| eval matched=trim(matched, "*")
| stats count values(file_path) as examples by matched

This will produce the below results. The counts will be different since I used random to generate data:

If this comment/answer was helpful, please up vote it. Thank you.

Glasses · ‎02-13-2020

Thank you Dmarling.
It will take me time to work thru your query.
I was hoping to run just one query against the "lookup" csv s which are updated daily.

Do you know if there is a way to add the keyword value from the csv to the results?

dmarling · ‎02-13-2020

I'm assuming the daily updated lookup file does not contain the asterisks around the keywords. You could create a scheduled search that runs after the daily update occurs on that lookup file. That search will update the lookup you will use for the wild card lookup definition. The below query can do that:

|inputlookup keyword.csv
| eval keywords="*".keyword."*"
| outputlookup wildcardkeyword.csv

You would then need to update your lookup definition to point at the wildcardkeyword file. I believe I have solved the request to add the keyword value from the csv to the results in my original answer. The screen shot shows that the value of the keyword field is present on the outputted table.

If this comment/answer was helpful, please up vote it. Thank you.

Glasses · ‎02-13-2020

sorry, yes I did wrap the word with stars.

I think I am good now, and I thank you for all your help.

mydog8it · ‎02-12-2020

Try this...

index=foo sourcetype=bar 
| lookup keyword.csv keyword OUTPUT keyword AS Matched 
| stats count by Matched

Glasses · ‎02-12-2020

thx for the reply however that would not work in my case

How to count lookup matches by the field values in the Lookup?

Splunk Observability for AI

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

Are you a member of the Splunk Community?

How to count lookup matches by the field values in the Lookup?

Splunk Observability for AI

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!