Hi, is there a best practice to achieve the following?
I am looking to search for events and then to output them to a csv on an hourly schedule but want to ensure that any events that are delayed are also caught and sent in the next schedule.
An example would be:
All events from source X arriving between 8:00 and 9:00 are exported to csv by scheduled search.
An event from source X that has a time stamp in the event of 8:30 misses this search windows and is indexed at 9:30.
How can I ensure that the 8:30 event is included in the scheduled search between 9:00 and 10:00 but that any previous events that were sent between 8:00 and 9:00 are not sent again.
Is it best to use the index time rather than extract the time from the event during indexing?
Is there a way to add a tag or field to each event so that I know when it has previously been exported during the scheduled search and can use this to exclude from subsequent searches?
If you've your logs coming not at real-time (you can get yesterday's data today), then it would be better to use the index time in your searches. Find out what is the maximum latency you can have (say it's 30 days), set your search time range to that (OR use All times) and include index time filter.
index=foo sourcetype=blah earliest=-30 _index_earliest=-1h@h _index_latest=@h | rest of the search
If you've your logs coming not at real-time (you can get yesterday's data today), then it would be better to use the index time in your searches. Find out what is the maximum latency you can have (say it's 30 days), set your search time range to that (OR use All times) and include index time filter.
index=foo sourcetype=blah earliest=-30 _index_earliest=-1h@h _index_latest=@h | rest of the search
What if you started your scheduled job at 15 mins after the hour and have your search return data from the previous hour (earliest=-1h@h latest=@h).