Getting Data In

Index time based retention - based on indexed time or event time?

Runals
Motivator

This information is probably located in one of the docs but didn't find it in anything I've read just now. Under normal circumstances current data rolls in and rolls out based on any number of parameters such as frozenTimePeriodInSecs. What happens when you ingest a bunch of historical data though and how does that impact retention? If the retention is strictly sized based it is one thing but time based seems to be another. My gut says this would be based on indexed time but not sure how historical data and timestamps play into bucket creation.

Tags (1)
0 Karma
1 Solution

sowings
Splunk Employee
Splunk Employee

It's based upon the event time.

A bucket (the constituent of an index, (read more here) spans a range of time. This range is set by the event time of the events in that bucket. A bucket is a candidate for rotation (this includes hot to warm, warm to cold, and cold to frozen) when it is the oldest bucket "in scope"(*). Oldest by this definition is based upon the newest time in the index. So a bucket can contain events from 2010, and then have a single event from June 21 2013, and it won't be a candidate for time based rules until frozenTimePeriodInSecs after June 21, 2013.

Note also that the most restrictive rule applies, so if an index is nowhere near full, but the time-based rule says it's time to go, then the bucket will be frozen (consider the _internal index; it has a max size of 500GB, but a retention time period of only 28 days).

  • Scope can be an entire volume, spanning multiple indexes (with volume:foo directives), or a single index, or an bucket state within an index, such as "warm buckets".

View solution in original post

chimbudp
Contributor

Indexed data has the original Timestamp of the incoming events into Splunk.
SO, every events are synchronized with event time and not the indexed time.
Later ,data will be moved from Hot->Warm-> Cold.->Frozen(based on indexes.conf settings)
When we Search for historical data , we need to restore the indexed data to thawed path , and by renaming the indexes (you might read the restore archived data in Splunk) ,we could able to see the historical events with historical Timestamp.

0 Karma

sowings
Splunk Employee
Splunk Employee

As a follow-up to this, note that thawed data lives outside of any retention policy whatsoever. The buckets therein must be managed manually.

0 Karma

sowings
Splunk Employee
Splunk Employee

It's based upon the event time.

A bucket (the constituent of an index, (read more here) spans a range of time. This range is set by the event time of the events in that bucket. A bucket is a candidate for rotation (this includes hot to warm, warm to cold, and cold to frozen) when it is the oldest bucket "in scope"(*). Oldest by this definition is based upon the newest time in the index. So a bucket can contain events from 2010, and then have a single event from June 21 2013, and it won't be a candidate for time based rules until frozenTimePeriodInSecs after June 21, 2013.

Note also that the most restrictive rule applies, so if an index is nowhere near full, but the time-based rule says it's time to go, then the bucket will be frozen (consider the _internal index; it has a max size of 500GB, but a retention time period of only 28 days).

  • Scope can be an entire volume, spanning multiple indexes (with volume:foo directives), or a single index, or an bucket state within an index, such as "warm buckets".

apujar
Splunk Employee
Splunk Employee

Data retention is not based on _time, its actually based on _indextime and max size set for example, if I index below sample data now,

2020-03-02 12:23:23 blah blah
Retention time: 6months

Maxsize: 100GB

then the _time of the event will be 2020-03-02 12:23:23 but _indextime will be 2025-06-25 HH:MM:SS

so this data will not get deleted immediately since _time of this event is 5 years old.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Firstly, a golden shovel. This is a very very old thread.

Secondly, you are mistaken. While the event will not get "immediately deleted" but for a completely different reason. There are several factors here:

- events are not handled on their own but by buckets

- hot buckets do not roll to frozen directly

- "unusual" events (too far in the past or "from the future") are indexed in quarantine buckets which might get rolled completely differently than your normal buckets.

0 Karma

immortalraghava
Path Finder

What happens when the old data is in hotbucket? Does this

"This range is set by the event time
of the events in that bucket."

still applied here ? The folder name does not it show this for hot bucket like it is mentioned for the warm buckets.

0 Karma
Get Updates on the Splunk Community!

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...

From Alert to Resolution: How Splunk Observability Helps SREs Navigate Critical ...

It's 3:17 AM, and your phone buzzes with an urgent alert. Wire transfer processing times have spiked, and ...