Speeding up a Search against lookup table searchin...

okretzer · ‎02-03-2021

Have a small lookup table with 135 dest_ip and a search that is searching that lookup table against a 40 TB index ( for a 6 month period for those IP's) When i run this search ( or add Ip to the lookup table, or even just search 1 or 2 single ips by themselves) against this 40tb index for a specific time period longer than a month, the search takes hours and i mean hours

My questions is without a datamodel, how can i speed this search up? I tried tstats but that doens'nt work unless you have datamodel ( at least i could not get it to work), tried TERM, could not get that to work.

Any ideas?

here is current search im using

index=myindex src_ip=*
| lookup mylookup.csv dest_ip OUTPUT dest_ip
| dedup src_ip, dest_ip
| table src_ip, dest_ip
| sort src_ip

the above search works great for alert i run every 15 minutes so see if anyone hits these ip's in the lookup, but for searching a large index, it takes forever. Any help in speeding up a search like this would be appreciated

thank you

bowesmana · ‎02-04-2021

@okretzer

As @richgalloway says, 40TB will take time regardless, however, from your original search I would suggest:

Replace dedup with stats, as I believe that will run faster than dedup. stats will throw away the fields you don't need, whereas dedup has to carry all the other fields in the event, which you are discarding anyway with the table command. Also, table command is not good for large data sets, as it runs on the search head. Use fields command instead, as that runs on the indexer. Currently you are passing all the data from the indexers back to the search head before discarding it.

index=myindex src_ip=*
| stats count by src_ip, dest_ip
| fields - count
| lookup mylookup.csv dest_ip OUTPUT dest_ip
| sort src_ip

Do some timings and look at job inspector to get an idea.

scelikok · ‎02-04-2021

Hi @okretzer,

Using TERM directive may make results faster, can you please try and tell us the difference?

I think there is typo in lookup also since lookup input and output fields are both the same. I didn't change it, only added TERM directive to use as pre-filter.

index=myindex 
    [| inputlookup mylookup.csv 
    | fields dest_ip 
    | rename dest_ip as search 
    | format "(" "TERM(" ")" ")" "OR" ")"]
| lookup mylookup.csv dest_ip OUTPUT dest_ip 
| dedup src_ip, dest_ip 
| table src_ip, dest_ip 
| sort src_ip

If this reply helps you an upvote and "Accept as Solution" is appreciated.

richgalloway · ‎02-04-2021

Searching 40TB is going to take a while no matter what you do. tstats will work without a datamodel, but only on indexed fields.

Try to put the lookup command as late in your query as possible. That reduces the number of lookups needed, which will speed up the search.

Check the Job Inspector to see where the search is spending its time. I suspect most is spent reading events. The only way to speed that up is to distribute the data among more indexers.

---
If this reply helps you, Karma would be appreciated.

Speeding up a Search against lookup table searching large index

lookup

tstats

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)

Are you a member of the Splunk Community?