Getting Data In

What happens when indexer encounters an event with timestamp older than configured retention period

immortalraghava
Path Finder

I had this particular scenario where I was not able to assert Splunk indexer behavior.
Retention period for a index is configured as 6 years.
I try to send some logs which are older than the configured retention period.

Some time the logs get into the index and some time it doesn't. (I run a simple search to find count of events)
(The log file did reach the indexer that part is tested. I find entries in metrics.log. )

What could be the reason for this intermittent behavior? Which stage does the filtering based on retention period takes place.
Will this old event also go through hot, warm and frozen states of a bucket ?
Any clarifications would be helpful.

Thanks

0 Karma

mayurr98
Super Champion

Based on consistency of timestamp on your data, there may be cases where you get a very old timestamp, say Dec 2013, today (may be bug, wrong logging or timestamp parsing). A data bucket is frozen only when the latest event (highest timestamp) on the bucket is older than your retention period. If the old data was received recently it'll be stored in a bucket with latest event within retention period and will be roll to frozen. All Splunk queries/report dashboard will show the earliest timestamp on the index as Dec 2013, even though your retention is 1 year only.

My suggestion would be to also enforce your data retention based on total index size (maxTotalDataSizeMB) along with retention period (frozenTimePeriodInSecs). This way you can start rolling data bucket to frozen before you run out of space. See this for more details.

https://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Setaretirementandarchivingpolicy#Freeze_d...

have a look at this answer
https://answers.splunk.com/answers/511747/why-is-the-retention-policy-not-working-on-certain.html

let me know if this helps!

0 Karma

immortalraghava
Path Finder

Thanks for the answer. But what I really hit was this.. Just now found this

https://answers.splunk.com/answers/31961/what-is-a-hot-quar-v1-directory-vs-standard-hot-v1.html

Even with quarantined buckets I find some inconsistencies. Some time old data, older than quarantinePastSecs gets into ordinary hot bucket. May be someone from Splunk should clear this. There are some comments already to the accepted answer which are still not addressed.

0 Karma

mescober_splunk
Splunk Employee
Splunk Employee

@immortalraghavan how did you check that the event went to quarantine bucket or to normal hot bucket? If it's by search, the events in the quarantine bucket will still return in search when searching that given log.

Only reason it won't be searchable after logging is when the bucket gets frozen based on retention policy (either size based or time based).

0 Karma

immortalraghava
Path Finder

The data is not rolled it is still in the hot quarantine bucket. The bucket was not there before sending the old data. THats how I confirmed that my current ingest created it. But it is not showing up in the results. Is there any other way I could check

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...