Have a small lookup table with 135 dest_ip and a search that is searching that lookup table against a 40 TB index ( for a 6 month period for those IP's) When i run this search ( or add Ip to the lookup table, or even just search 1 or 2 single ips by themselves) against this 40tb index for a specific time period longer than a month, the search takes hours and i mean hours
My questions is without a datamodel, how can i speed this search up? I tried tstats but that doens'nt work unless you have datamodel ( at least i could not get it to work), tried TERM, could not get that to work.
Any ideas?
here is current search im using
index=myindex src_ip=*
| lookup mylookup.csv dest_ip OUTPUT dest_ip
| dedup src_ip, dest_ip
| table src_ip, dest_ip
| sort src_ip
the above search works great for alert i run every 15 minutes so see if anyone hits these ip's in the lookup, but for searching a large index, it takes forever. Any help in speeding up a search like this would be appreciated
thank you
As @richgalloway says, 40TB will take time regardless, however, from your original search I would suggest:
Replace dedup with stats, as I believe that will run faster than dedup. stats will throw away the fields you don't need, whereas dedup has to carry all the other fields in the event, which you are discarding anyway with the table command. Also, table command is not good for large data sets, as it runs on the search head. Use fields command instead, as that runs on the indexer. Currently you are passing all the data from the indexers back to the search head before discarding it.
index=myindex src_ip=*
| stats count by src_ip, dest_ip
| fields - count
| lookup mylookup.csv dest_ip OUTPUT dest_ip
| sort src_ip
Do some timings and look at job inspector to get an idea.
Hi @okretzer,
Using TERM directive may make results faster, can you please try and tell us the difference?
I think there is typo in lookup also since lookup input and output fields are both the same. I didn't change it, only added TERM directive to use as pre-filter.
index=myindex
[| inputlookup mylookup.csv
| fields dest_ip
| rename dest_ip as search
| format "(" "TERM(" ")" ")" "OR" ")"]
| lookup mylookup.csv dest_ip OUTPUT dest_ip
| dedup src_ip, dest_ip
| table src_ip, dest_ip
| sort src_ip
Searching 40TB is going to take a while no matter what you do. tstats will work without a datamodel, but only on indexed fields.
Try to put the lookup command as late in your query as possible. That reduces the number of lookups needed, which will speed up the search.
Check the Job Inspector to see where the search is spending its time. I suspect most is spent reading events. The only way to speed that up is to distribute the data among more indexers.