Hi community,
My forwarder is putting logs in index A before 2024/06/01, and in index B after this date. To avoid miss any data when searching, I have to have a query which searches both index.
(index="A" "reports" "arts") OR (index="B" "reports" "arts")
In this case, I believe if now I select "last 24 hours" in the time selector, the query will still search index A, which is unnecessary. I guess it would be more efficient if I can add a time limit in the first part, to limit the range of events.
(earliest=-6mon latest="06/01/2024:00:00:00" index="A" "reports" "arts") OR (earliest="06/01/2024:00:00:00" index="B" "reports" "arts")
I expect Splunk would take an intersection of the two time ranges, but it doesn't. I noticed that adding these surprisingly slows down the query. The "earliest" and "latest" I added override the time selector. Even though I selected "last 24 hours", it returns events in the past 6 months of index A.
Again, my first query should give the correct result, but I'm still wondering if there's a way to improve the efficiency with the date 06/01.
Any suggestions are appreciated!
If you start to mix the hard coded timestamps in your query with expectations from the time picker, I think you will get confused. Is this in a dashboard or as a general search?
If you want a time picker to be used, then why is this not OK
(index="A" OR index="B") "reports" "arts"
because then both indexes will be searched for data. If the time picker is last 24 hours then there will be no data found from index=A and if your time picker is set to a range before 2024/06/01 then it will find no data from index=B
Whereas if your time picker is from 2024/05/01 to 2024/07/01 it will find data from both indexes
Don't start trying to optimise - Splunk does not work the way you seem to be implying. Data in Splunk is stored in time buckets, so there will be NO time buckets for index=A after 2024/06/01, so there is no data to search and the same for index=B for time before 2024/06/01.
You don't need to worry about efficiency of the search for this - Splunk is good at this.
Thanks for your reply!
It's a dashboard, and we may need to run a query to check something as well.
I agree with what you said, checking empty buckets wouldn't take too much time. I was assuming the previous bucket is still getting some logs, and by ignoring logs after the transition date could be faster save me from removing duplicates. while in my case, I believe it should be empty.
Don't pre-optimize - if you think you may have duplicates, then plan to deal with it conceptually in the search logic, i.e. how would you recognised those events as duplicates.
As it's a dashboard then you can set tokens to play with time, so you could easily set limiting tokens to control date ranges between the two indexes.
i.e. if your search range is last 3 months then you can have a search, e.g.
| makeresults
| addinfo
| eval cut_off_date=strptime("2024-06-01", "%F")
``` INDEX A ```
| eval index_a_earliest = min(info_min_time, cut_off_date)
| eval index_a_latest = min(info_max_time, cut_off_date)
``` INDEX B ```
| eval index_b_earliest = max(info_min_time, cut_off_date)
| eval index_b_latest = max(info_max_time, cut_off_date)
and then set tokens in a <done> clause for these values, i.e.
<done>
<set token="index_a_earliest">$result.index_a_earliest$</set>
<set token="index_a_latest">$result.index_a_latest$</set>
<set token="index_b_earliest">$result.index_b_earliest$</set>
<set token="index_b_latest">$result.index_b_latest$</set>
</done>
and then in your searches use the tokens to define the search
(index=A earliest=$index_a_earliest$ latest=$index_a_latest$) OR
(index=B earliest=$index_b_earliest$ latest=$index_b_latest$)...
Need a bit clarification. Do you mean that the following is faster than or about as fast as your second search?
earliest=-6mon latest=now
(index="A" "reports" "arts") OR (index="B" "reports" "arts")
In other words, in your first search, setting earliest to last 6 months and latest as now (presumably in time selector) is faster or as fast as limiting each dataset in search command?
Sorry for the confusion. I have two sets of time range.
One is made from time selector, that is used for return results happened in the range I'm interested in.
The other is hard-coded in the query. I want to force Splunk to search index A's events at most in a range of past 6 months to 06/01/24 (during this time, logs went to index A only), and B at most in range 06/01/24 to now. I want Splunk to auto find an intersection of this hard-coded range and the range from time selector.