Hi All,
We are ingesting batch application logs in an index (let's call it "myindex") with a scripted input that run every 5 minutes in our UNIX Production Machine. The scripted input run with this crontab:
*/5 * * * *
so it start at the minute 5th, 10th, 15th etc... and it takes few seconds to run.
After that the logs are ingested by Splunk and indexed in "myindex".
Due to the fact that they are custom logs, we have written several regular expression to extract valuable fieds, but due to the fact that the regular expression are too much, and they are applied at Search Time, this is slowing down the SPL execution on large data subsets.
For this reason we have implemented a Scheduled Report that save the output in a Summary Index (let's call it "mysummaryindex").
The Scheduled Report run every 5 minutes, extracting the last 5 minutes of data, with this crontab set-up in Splunk:
2,7,12,17,22,27,32,37,42,47,52,57 * * * *
so it start at the minute 7th, 12th, 17th etc... and it takes few seconds to run.
We have delayed the run of 2 minutes to give time to Splunk to index the data.
Here below you can find a schema of what we have explained above:
So now it comes the question 🙂
Due to the fact that there are long running batch jobs, the _time of our events represent the "start" of the batch jobs, and when they will be indexed in "myindex" they could refer to the past.
For this reason we have to find a way to "force" the Scheduled Report to extract based on _indextime instead of _time.
I have look at this post:
https://answers.splunk.com/answers/171/using-indextime-to-specify-time-range.html
and tried to apply it to the SPL in the Scheduled Report as follow:
index="myindex" sourcetype="mysourcetype" host="myhost" AND source="myfiles.*"
_index_earliest=-5m@m _index_latest=@m
| rex field=_raw "myregularExpression1..."
| rex field=_raw "myregularExpression2..."
etc...
but it seems it is missing some data (saying that it is even not very simple to compare what it should have pick up and what it has summarized).
Do you see something wrong?
Thanks a lot,
Edoardo
Hi All,
I realized how to solve this issue.
Basically the below instruction:
_index_earliest=-5m@m _index_latest=@m
works perfectly but in Splunk this piece of SPL code does not drive the "Splunk Time range picker" as I was expecting. In fact I haven't found a way till now to override it.
So for my case it is important that the Splunk Time range picker is set-up in this way:
So with 48 hours of time range (to avoid missing any event) and with "_index_earliest=-5m@m _index_latest=@m" you will be able to extract only the events indexed in the last 5 minutes (that it is what I was looking for) and this _index_earliest/latest Time Modifier command is improving incredibly the query performance.
Hope this can help you!
Hi All,
I realized how to solve this issue.
Basically the below instruction:
_index_earliest=-5m@m _index_latest=@m
works perfectly but in Splunk this piece of SPL code does not drive the "Splunk Time range picker" as I was expecting. In fact I haven't found a way till now to override it.
So for my case it is important that the Splunk Time range picker is set-up in this way:
So with 48 hours of time range (to avoid missing any event) and with "_index_earliest=-5m@m _index_latest=@m" you will be able to extract only the events indexed in the last 5 minutes (that it is what I was looking for) and this _index_earliest/latest Time Modifier command is improving incredibly the query performance.
Hope this can help you!