Solved: Re: Search Optimization saved search using time vs...

wrussell12 · ‎08-06-2019

I currently have a search, which takes 5 minutes to complete, I did not write the search query, and would like to see if I can optimize it.

This includes a | where _time < now()-30, over a 7 day span.

I am a bit confused at this where clause. Is this searching for anything within the last 7 days, but older than "30"ms ago?

index="car" OR index="keys" source=key.* | bucket _time span=1h | stats count by source, _time | timechart span=1h values(count) by source | where _time < now()-30 | rename key.* as * | table *

Results:
Execution Time: 5 Minutes
Events Searched: 115,096,419
Events Found: 169

My optimization removes the where clause and adds earliest=-7d latest=now()-30.

Optimized

index="car" OR index="keys" source=key.*  earliest=-7d latest=now()-30  | bucket _time span=1h | stats count by source, _time | timechart span=1h values(count) by source  | rename key.* as * | table *

Results:
Execution Time: 1 Minute
Events Searched: 17,152
Events Found: 169

Although the 169 results are the same, this search changes and comparing exactly results will not work.
Is this a good optimization?

niketn · ‎08-06-2019

@wrussell12 now()-30 means pick events from 30 seconds before the search query runs. Not sure if this is to accommodate for delay in data coming to Splunk or for Real-Time search with 30 seconds window or for Search panel with refresh every 30 seconds.

Since you are working only with metadata fields index and source and _time you can write tstats which will work way faster than what you have!

Try the following run anywhere example based on Splunk's _internal index and confirm

With regular index search

index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s
| timechart span=1h count by source

169 results by scanning 609,716 events in 4.832 second

With tstats

| tstats count as EventCount where index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source

If you need null values for selected time span you would need the following hack with appendpipe (as per one of my older answers for this: https://answers.splunk.com/answers/595248/timechart-with-no-data-gives-no-results-found.html )

| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h 
| timechart span=1h sum(EventCount) as EventCount by source
| appendpipe [| makeresults
          | eval Time=relative_time(now(),"-7d")."|".relative_time(now(),"-30s")
          | makemv Time delim="|"
          | mvexpand Time
          | eval _time=Time
          | fields - Time
          | bin _time span=1h
          | eval EventCount=0]
| dedup _time
| sort _time
| fillnull value=0

This search has completed and has returned 169 results by scanning 610,450 events in 0.121 seconds

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn · ‎08-06-2019

@wrussell12 now()-30 means pick events from 30 seconds before the search query runs. Not sure if this is to accommodate for delay in data coming to Splunk or for Real-Time search with 30 seconds window or for Search panel with refresh every 30 seconds.

Since you are working only with metadata fields index and source and _time you can write tstats which will work way faster than what you have!

Try the following run anywhere example based on Splunk's _internal index and confirm

With regular index search

index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s
| timechart span=1h count by source

169 results by scanning 609,716 events in 4.832 second

With tstats

| tstats count as EventCount where index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source

If you need null values for selected time span you would need the following hack with appendpipe (as per one of my older answers for this: https://answers.splunk.com/answers/595248/timechart-with-no-data-gives-no-results-found.html )

| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h 
| timechart span=1h sum(EventCount) as EventCount by source
| appendpipe [| makeresults
          | eval Time=relative_time(now(),"-7d")."|".relative_time(now(),"-30s")
          | makemv Time delim="|"
          | mvexpand Time
          | eval _time=Time
          | fields - Time
          | bin _time span=1h
          | eval EventCount=0]
| dedup _time
| sort _time
| fillnull value=0

This search has completed and has returned 169 results by scanning 610,450 events in 0.121 seconds

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

wrussell12 · ‎08-07-2019

Thank you very much.

I received 169 results, and the search was reduced from 5 minutes, to 5 seconds.

Amazing.

niketn · ‎08-07-2019

Awesome, if too many events would have been an issue for your previous query now you can try -30d and what not 🙂 with tstats.

Happy Splunking!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

nareshinsvu · ‎08-06-2019

Hi @wrussell

Your first search query is searching for All data 30mins ago. This is equivalent to - earliest="01/01/1970 11:00:00.000" latest=now()-30

Where-as your second query is time bound. Search for Data older than 30mins and until last 7 days. - earliest=-7d latest=now()-30

If your indexes have data older than 7 days, your 1st search will obviously be expensive than your 2nd search.

Search Optimization saved search using time vs where clause

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services