I had this particular scenario where I was not able to assert Splunk indexer behavior.
Retention period for a index is configured as 6 years.
I try to send some logs which are older than the configured retention period.
Some time the logs get into the index and some time it doesn't. (I run a simple search to find count of events)
(The log file did reach the indexer that part is tested. I find entries in metrics.log. )
What could be the reason for this intermittent behavior? Which stage does the filtering based on retention period takes place.
Will this old event also go through hot, warm and frozen states of a bucket ?
Any clarifications would be helpful.
Based on consistency of timestamp on your data, there may be cases where you get a very old timestamp, say Dec 2013, today (may be bug, wrong logging or timestamp parsing). A data bucket is frozen only when the latest event (highest timestamp) on the bucket is older than your retention period. If the old data was received recently it'll be stored in a bucket with latest event within retention period and will be roll to frozen. All Splunk queries/report dashboard will show the earliest timestamp on the index as Dec 2013, even though your retention is 1 year only.
My suggestion would be to also enforce your data retention based on total index size (maxTotalDataSizeMB) along with retention period (frozenTimePeriodInSecs). This way you can start rolling data bucket to frozen before you run out of space. See this for more details.
have a look at this answer
let me know if this helps!
Thanks for the answer. But what I really hit was this.. Just now found this
Even with quarantined buckets I find some inconsistencies. Some time old data, older than quarantinePastSecs gets into ordinary hot bucket. May be someone from Splunk should clear this. There are some comments already to the accepted answer which are still not addressed.
@immortalraghavan how did you check that the event went to quarantine bucket or to normal hot bucket? If it's by search, the events in the quarantine bucket will still return in search when searching that given log.
Only reason it won't be searchable after logging is when the bucket gets frozen based on retention policy (either size based or time based).
The data is not rolled it is still in the hot quarantine bucket. The bucket was not there before sending the old data. THats how I confirmed that my current ingest created it. But it is not showing up in the results. Is there any other way I could check