Solved: Re: Compare a SPLUNK query result set against a ve...

garryclarke · ‎10-30-2014

I have a SPLUNK query which when run returns me a list of codes.

index=test stats count by code | search count >10

Code1 11
Code2 20
Code5 23
Code8 66

What I would however like to do now is to append logic to the query so that each of the codes returned is compared against a rather large lookup file, 18 million rows. My desire is to then only return from the original query those codes that are present in the very large lookup file. I want to compare my result set against the lookup file in effect.

I have been struggling with joins and inputlookup as these queries seem to be taking forever to execute and are not returning anything generally due to the lookup being truncated with the following error, “[subsearch]: Subsearch produced 18490006 results, truncating to maxout 50000.”

index=test stats count by code | search count >10 | join code [|inputlookup MY_CODE_LOOKUP.csv]

The result set from the original query will always be small i.e. approximately 5 codes.
When I take the codes returned from the original query and grep them at the Unix command line they return within seconds.
Is there a more elegant way I could do this within SPLUNK?
Any help much appreciated

vasanthmss · ‎10-30-2014

Try lookup instead of join
http://docs.splunk.com/Documentation/Splunk/6.1.4/SearchReference/lookup

V

View solution in original post

somesoni2 · ‎10-30-2014

Try this

index=test |stats count by code | search count >10 | join code [|inputlookup MY_CODE_LOOKUP.csv |search [search 
index=test |stats count by code |table code ]]

View solution in original post

somesoni2 · ‎10-30-2014

Try this

index=test |stats count by code | search count >10 | join code [|inputlookup MY_CODE_LOOKUP.csv |search [search 
index=test |stats count by code |table code ]]

garryclarke · ‎11-03-2014

Thanks somesoni2 your suggestion seems to have provided me with the logic to get the solution to work. I do however see a performance issue when running, it seems to take a few seconds to determine the initial stats count by code however the join to lookup in the 18 million row lookup seems to take an additional 2 minutes.

Is there a way to preveent the sub search searching the entire lookup file and just return on the first match assuming we might be searching the entire file to the end?

Taking the count results returned and grep'ing from the Unix command line still only takesa few seconds.

vasanthmss · ‎10-30-2014

Try lookup instead of join
http://docs.splunk.com/Documentation/Splunk/6.1.4/SearchReference/lookup

V

Compare a SPLUNK query result set against a very large lookup file

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Join the Conversation

Compare a SPLUNK query result set against a very large lookup file

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey