Splunk Search

Compare a SPLUNK query result set against a very large lookup file

garryclarke
Path Finder

I have a SPLUNK query which when run returns me a list of codes.

index=test stats count by code | search count >10

Code1 11
Code2 20
Code5 23
Code8 66

What I would however like to do now is to append logic to the query so that each of the codes returned is compared against a rather large lookup file, 18 million rows. My desire is to then only return from the original query those codes that are present in the very large lookup file. I want to compare my result set against the lookup file in effect.

I have been struggling with joins and inputlookup as these queries seem to be taking forever to execute and are not returning anything generally due to the lookup being truncated with the following error, “[subsearch]: Subsearch produced 18490006 results, truncating to maxout 50000.”

index=test stats count by code | search count >10 | join code [|inputlookup MY_CODE_LOOKUP.csv]

The result set from the original query will always be small i.e. approximately 5 codes.
When I take the codes returned from the original query and grep them at the Unix command line they return within seconds.
Is there a more elegant way I could do this within SPLUNK?
Any help much appreciated

Tags (2)
0 Karma
2 Solutions

somesoni2
Revered Legend

Try this

index=test |stats count by code | search count >10 | join code [|inputlookup MY_CODE_LOOKUP.csv |search [search 
index=test |stats count by code |table code ]]

View solution in original post

somesoni2
Revered Legend

Try this

index=test |stats count by code | search count >10 | join code [|inputlookup MY_CODE_LOOKUP.csv |search [search 
index=test |stats count by code |table code ]]

garryclarke
Path Finder

Thanks somesoni2 your suggestion seems to have provided me with the logic to get the solution to work. I do however see a performance issue when running, it seems to take a few seconds to determine the initial stats count by code however the join to lookup in the 18 million row lookup seems to take an additional 2 minutes.

Is there a way to preveent the sub search searching the entire lookup file and just return on the first match assuming we might be searching the entire file to the end?

Taking the count results returned and grep'ing from the Unix command line still only takesa few seconds.

0 Karma

vasanthmss
Motivator
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...