Splunk Search

Search Optimization saved search using time vs where clause

wrussell12
Explorer

I currently have a search, which takes 5 minutes to complete, I did not write the search query, and would like to see if I can optimize it.

This includes a | where _time < now()-30, over a 7 day span.

I am a bit confused at this where clause. Is this searching for anything within the last 7 days, but older than "30"ms ago?

index="car" OR index="keys" source=key.* | bucket _time span=1h | stats count by source, _time | timechart span=1h values(count) by source | where _time < now()-30 | rename key.* as * | table *

Results:
Execution Time: 5 Minutes
Events Searched: 115,096,419
Events Found: 169

My optimization removes the where clause and adds earliest=-7d latest=now()-30.

Optimized

index="car" OR index="keys" source=key.*  earliest=-7d latest=now()-30  | bucket _time span=1h | stats count by source, _time | timechart span=1h values(count) by source  | rename key.* as * | table *

Results:
Execution Time: 1 Minute
Events Searched: 17,152
Events Found: 169

Although the 169 results are the same, this search changes and comparing exactly results will not work.
Is this a good optimization?

0 Karma
1 Solution

niketn
Legend

@wrussell12 now()-30 means pick events from 30 seconds before the search query runs. Not sure if this is to accommodate for delay in data coming to Splunk or for Real-Time search with 30 seconds window or for Search panel with refresh every 30 seconds.

Since you are working only with metadata fields index and source and _time you can write tstats which will work way faster than what you have!

Try the following run anywhere example based on Splunk's _internal index and confirm


With regular index search

index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s
| timechart span=1h count by source

169 results by scanning 609,716 events in 4.832 second


With tstats

| tstats count as EventCount where index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source

If you need null values for selected time span you would need the following hack with appendpipe (as per one of my older answers for this: https://answers.splunk.com/answers/595248/timechart-with-no-data-gives-no-results-found.html )

| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h 
| timechart span=1h sum(EventCount) as EventCount by source
| appendpipe [| makeresults
          | eval Time=relative_time(now(),"-7d")."|".relative_time(now(),"-30s")
          | makemv Time delim="|"
          | mvexpand Time
          | eval _time=Time
          | fields - Time
          | bin _time span=1h
          | eval EventCount=0]
| dedup _time
| sort _time
| fillnull value=0

This search has completed and has returned 169 results by scanning 610,450 events in 0.121 seconds

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn
Legend

@wrussell12 now()-30 means pick events from 30 seconds before the search query runs. Not sure if this is to accommodate for delay in data coming to Splunk or for Real-Time search with 30 seconds window or for Search panel with refresh every 30 seconds.

Since you are working only with metadata fields index and source and _time you can write tstats which will work way faster than what you have!

Try the following run anywhere example based on Splunk's _internal index and confirm


With regular index search

index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s
| timechart span=1h count by source

169 results by scanning 609,716 events in 4.832 second


With tstats

| tstats count as EventCount where index IN ("_internal","_audit") source=*  earliest=-7d latest=@s-30s by source _time span=1h
| timechart span=1h sum(EventCount) as EventCount by source

If you need null values for selected time span you would need the following hack with appendpipe (as per one of my older answers for this: https://answers.splunk.com/answers/595248/timechart-with-no-data-gives-no-results-found.html )

| tstats count as EventCount where index IN ("_internal","_audit") source=* earliest=-7d latest=@s-30s by source _time span=1h 
| timechart span=1h sum(EventCount) as EventCount by source
| appendpipe [| makeresults
          | eval Time=relative_time(now(),"-7d")."|".relative_time(now(),"-30s")
          | makemv Time delim="|"
          | mvexpand Time
          | eval _time=Time
          | fields - Time
          | bin _time span=1h
          | eval EventCount=0]
| dedup _time
| sort _time
| fillnull value=0

This search has completed and has returned 169 results by scanning 610,450 events in 0.121 seconds

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

wrussell12
Explorer

Thank you very much.

I received 169 results, and the search was reduced from 5 minutes, to 5 seconds.

Amazing.

niketn
Legend

Awesome, if too many events would have been an issue for your previous query now you can try -30d and what not 🙂 with tstats.

Happy Splunking!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

nareshinsvu
Builder

Hi @wrussell

Your first search query is searching for All data 30mins ago. This is equivalent to - earliest="01/01/1970 11:00:00.000" latest=now()-30

Where-as your second query is time bound. Search for Data older than 30mins and until last 7 days. - earliest=-7d latest=now()-30

If your indexes have data older than 7 days, your 1st search will obviously be expensive than your 2nd search.

Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...