I currently have a search, which takes 5 minutes to complete, I did not write the search query, and would like to see if I can optimize it.
This includes a | where _time < now()-30, over a 7 day span.
I am a bit confused at this where clause. Is this searching for anything within the last 7 days, but older than "30"ms ago?
index="car" OR index="keys" source=key.* | bucket _time span=1h | stats count by source, _time | timechart span=1h values(count) by source | where _time < now()-30 | rename key.* as * | table *
Results:
Execution Time: 5 Minutes
Events Searched: 115,096,419
Events Found: 169
My optimization removes the where clause and adds earliest=-7d latest=now()-30.
Optimized
index="car" OR index="keys" source=key.* earliest=-7d latest=now()-30 | bucket _time span=1h | stats count by source, _time | timechart span=1h values(count) by source | rename key.* as * | table *
Results:
Execution Time: 1 Minute
Events Searched: 17,152
Events Found: 169
Although the 169 results are the same, this search changes and comparing exactly results will not work.
Is this a good optimization?
@wrussell12 now()-30
means pick events from 30 seconds before the search query runs. Not sure if this is to accommodate for delay in data coming to Splunk or for Real-Time search with 30 seconds window or for Search panel with refresh every 30 seconds.
Since you are working only with metadata fields index and source and _time you can write tstats which will work way faster than what you have!
Try the following run anywhere example based on Splunk's _internal index and confirm
With regular index search
index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s
| timechart span=1h count by source
169 results by scanning 609,716 events in 4.832
second
With tstats
| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source
If you need null values for selected time span you would need the following hack with appendpipe (as per one of my older answers for this: https://answers.splunk.com/answers/595248/timechart-with-no-data-gives-no-results-found.html )
| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source
| appendpipe [| makeresults
| eval Time=relative_time(now(),"-7d")."|".relative_time(now(),"-30s")
| makemv Time delim="|"
| mvexpand Time
| eval _time=Time
| fields - Time
| bin _time span=1h
| eval EventCount=0]
| dedup _time
| sort _time
| fillnull value=0
This search has completed and has returned 169 results by scanning 610,450 events in 0.121
seconds
@wrussell12 now()-30
means pick events from 30 seconds before the search query runs. Not sure if this is to accommodate for delay in data coming to Splunk or for Real-Time search with 30 seconds window or for Search panel with refresh every 30 seconds.
Since you are working only with metadata fields index and source and _time you can write tstats which will work way faster than what you have!
Try the following run anywhere example based on Splunk's _internal index and confirm
With regular index search
index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s
| timechart span=1h count by source
169 results by scanning 609,716 events in 4.832
second
With tstats
| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source
If you need null values for selected time span you would need the following hack with appendpipe (as per one of my older answers for this: https://answers.splunk.com/answers/595248/timechart-with-no-data-gives-no-results-found.html )
| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source
| appendpipe [| makeresults
| eval Time=relative_time(now(),"-7d")."|".relative_time(now(),"-30s")
| makemv Time delim="|"
| mvexpand Time
| eval _time=Time
| fields - Time
| bin _time span=1h
| eval EventCount=0]
| dedup _time
| sort _time
| fillnull value=0
This search has completed and has returned 169 results by scanning 610,450 events in 0.121
seconds
Thank you very much.
I received 169 results, and the search was reduced from 5 minutes, to 5 seconds.
Amazing.
Awesome, if too many events would have been an issue for your previous query now you can try -30d and what not 🙂 with tstats.
Happy Splunking!
Hi @wrussell
Your first search query is searching for All data 30mins ago. This is equivalent to - earliest="01/01/1970 11:00:00.000" latest=now()-30
Where-as your second query is time bound. Search for Data older than 30mins and until last 7 days. - earliest=-7d latest=now()-30
If your indexes have data older than 7 days, your 1st search will obviously be expensive than your 2nd search.