Splunk Search

Consider time range when searching in two index

syk19567
Explorer

Hi community,

 

My forwarder is putting logs in index A before 2024/06/01, and in index B after this date. To avoid miss any data when searching, I have to have a query which searches both index.

(index="A" "reports" "arts") OR (index="B" "reports" "arts") 

In this case, I believe if now I select "last 24 hours" in the time selector, the query will still search index A, which is unnecessary. I guess it would be more efficient if I can add a time limit in the first part, to limit the range of events.

(earliest=-6mon latest="06/01/2024:00:00:00" index="A" "reports" "arts") OR (earliest="06/01/2024:00:00:00" index="B" "reports" "arts") 

 

I expect Splunk would take an intersection of the two time ranges, but it doesn't. I noticed that adding these surprisingly slows down the query. The "earliest" and "latest" I added override the time selector. Even though I selected "last 24 hours", it returns events in the past 6 months of index A.

 

Again, my first query should give the correct result, but I'm still wondering if there's a way to improve the efficiency with the date 06/01.

Any suggestions are appreciated!

Labels (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you start to mix the hard coded timestamps in your query with expectations from the time picker, I think you will get confused. Is this in a dashboard or as a general search?

If you want a time picker to be used, then why is this not OK

(index="A" OR index="B") "reports" "arts"

because then both indexes will be searched for data. If the time picker is last 24 hours then there will be no data found from index=A and if your time picker is set to a range before 2024/06/01 then it will find no data from index=B

Whereas if your time picker is from 2024/05/01 to 2024/07/01 it will find data from both indexes

Don't start trying to optimise - Splunk does not work the way you seem to be implying. Data in Splunk is stored in time buckets, so there will be NO time buckets for index=A after 2024/06/01, so there is no data to search and the same for index=B for time before 2024/06/01.

You don't need to worry about efficiency of the search for this - Splunk is good at this.

0 Karma

syk19567
Explorer

Thanks for your reply!

It's a dashboard, and we may need to run a query to check something as well.

I agree with what you said, checking empty buckets wouldn't take too much time. I was assuming the previous bucket is still getting some logs, and by ignoring logs after the transition date could be faster save me from removing duplicates. while in my case, I believe it should be empty.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Don't pre-optimize - if you think you may have duplicates, then plan to deal with it conceptually in the search logic, i.e. how would you recognised those events as duplicates.

As it's a dashboard then you can set tokens to play with time, so you could easily set limiting tokens to control date ranges between the two indexes.

i.e. if your search range is last 3 months then you can have a search, e.g.

| makeresults
| addinfo
| eval cut_off_date=strptime("2024-06-01", "%F")
``` INDEX A ```
| eval index_a_earliest = min(info_min_time, cut_off_date)
| eval index_a_latest   = min(info_max_time, cut_off_date)

``` INDEX B ```
| eval index_b_earliest = max(info_min_time, cut_off_date)
| eval index_b_latest   = max(info_max_time, cut_off_date)

and then set tokens in a <done> clause for these values, i.e.

<done>
  <set token="index_a_earliest">$result.index_a_earliest$</set>
  <set token="index_a_latest">$result.index_a_latest$</set>
  <set token="index_b_earliest">$result.index_b_earliest$</set>
  <set token="index_b_latest">$result.index_b_latest$</set>
</done>

and then in your searches use the tokens to define the search

(index=A earliest=$index_a_earliest$ latest=$index_a_latest$) OR
(index=B earliest=$index_b_earliest$ latest=$index_b_latest$)...
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Need a bit clarification.  Do you mean that the following is faster than or about as fast as your second search?

earliest=-6mon latest=now
  (index="A" "reports" "arts") OR (index="B" "reports" "arts")

In other words, in your first search, setting earliest to last 6 months and latest as now (presumably in time selector) is faster or as fast as limiting each dataset in search command?

0 Karma

syk19567
Explorer

Sorry for the confusion. I have two sets of time range.

One is made from time selector, that is used for return results happened in the range I'm interested in.

The other is hard-coded in the query. I want to force Splunk to search index A's events at most in a range of past 6 months to 06/01/24 (during this time, logs went to index A only), and B at most in range 06/01/24 to now. I want Splunk to auto find an intersection of this hard-coded range and the range from time selector.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...

Customer success is front and center at .conf25

Hi Splunkers, If you are not able to be at .conf25 in person, you can still learn about all the latest news ...