Splunk Search

Speeding up a Search against lookup table searching large index

okretzer
Engager

Have a small lookup table with 135 dest_ip and a search that is  searching that lookup table against a 40 TB  index ( for a 6 month period for those IP's)  When i run this search ( or add Ip to the lookup table, or even just search 1 or 2 single ips by themselves) against this 40tb index for a specific time period longer than a month, the search takes hours and i mean hours

My questions is without a datamodel, how can i speed this search up? I tried tstats but that doens'nt work unless you have datamodel ( at least i could not get it to work), tried TERM, could not get that to work.

Any ideas?

here is current search im using

index=myindex src_ip=*
| lookup mylookup.csv dest_ip OUTPUT dest_ip
| dedup src_ip, dest_ip
| table src_ip, dest_ip
| sort src_ip

the above search works great for alert i run every 15 minutes so see if anyone hits these ip's in the lookup, but for searching a large index, it takes forever. Any help in speeding up a search like this would be appreciated

 

thank you

 

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@okretzer 

As @richgalloway says, 40TB will take time regardless, however, from your original search I would suggest:

Replace dedup with stats, as I believe that will run faster than dedup. stats will throw away the fields you don't need, whereas dedup has to carry all the other fields in the event, which you are discarding anyway with the table command. Also, table command is not good for large data sets, as it runs on the search head. Use fields command instead, as that runs on the indexer. Currently you are passing all the data from the indexers back to the search head before discarding it.

index=myindex src_ip=*
| stats count by src_ip, dest_ip
| fields - count
| lookup mylookup.csv dest_ip OUTPUT dest_ip
| sort src_ip

Do some timings and look at job inspector to get an idea.

scelikok
SplunkTrust
SplunkTrust

Hi @okretzer,

Using TERM directive may make results faster, can you please try and tell us the difference?

I think there is typo in lookup also since lookup input and output fields are both the same. I didn't change it, only added TERM directive to use as pre-filter.

index=myindex 
    [| inputlookup mylookup.csv 
    | fields dest_ip 
    | rename dest_ip as search 
    | format "(" "TERM(" ")" ")" "OR" ")"]
| lookup mylookup.csv dest_ip OUTPUT dest_ip 
| dedup src_ip, dest_ip 
| table src_ip, dest_ip 
| sort src_ip

 

If this reply helps you an upvote and "Accept as Solution" is appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Searching 40TB is going to take a while no matter what you do.  tstats will work without a datamodel, but only on indexed fields.

Try to put the lookup command as late in your query as possible.  That reduces the number of lookups needed, which will speed up the search.

Check the Job Inspector to see where the search is spending its time.  I suspect most is spent reading events.  The only way to speed that up is to distribute the data among more indexers.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...