Splunk Search

Does retention policy depend on indextime or _time?

nawazns5038
Builder

how can we get the oldest index time of an index ?

Does retention policy depend on indextime or _time ?

somesoni2
Revered Legend

The retention policy depends upon the _time values and not the index time. You can run following query find the min/max _time value on a index.

| rest /services/data/indexes | table title frozenTimePeriodInSecs minTime maxTime

Where,

maxTime - ISO8601 format timestamp of the newest event time in the index.
minTime - ISO8601 format timestamp of the oldest event time in the index.

nawazns5038
Builder

I have an index which has data from two years old and even latest data. how is retention policy applied to that data , that index has a retention of 60 days.

0 Karma

somesoni2
Revered Legend

Data for each index is stored in buckets. Each bucket, when it's moved to cold stage (from Hot-> Warm->Cold, from warm onwards it becomes readonly), the retention policy will check the timestamp of newest/start event on the bucket. If the timestamp of newest event is older than retention period (in your case 60 days) from now, then the bucket is rolled to frozen. Else it'll be kept as is. You can run following query for an index to see what all buckets are there and timestamp of oldest (startEpoch) and newest (endEpoch) event.

| dbinspect index=YourIndexNameHere earliest=0 | table index bucketId startEpoch endEpoch | convert ctime(*Epoch) as *HumanReadable

nawazns5038
Builder

Thanks for the information got a bit clarity,

so the retention period applies only on the cold buckets? What if the warm buckets have data older than 60 days and it has not rolled to cold because it hasn't reached the rolling limit of the bucket.

0 Karma

somesoni2
Revered Legend

That is correct, the retention policy only applies to cold buckets. If the warm bucket has data older than retention period, the data will stay there. That also means that your warm to cold bucket rolling configuration may not be adequate for that index (maxWarmDBCount setting). You may think of lowering this number and/or setting proper hot to warm bucket rolling so that sufficient number of warm buckets are generated (bucket rolles from hot to warm if its size reaches a limit maxDataSize or its lifetime is older than maxHotSpanSecs)

0 Karma

nawazns5038
Builder

Oh that's bad. Is it recommended to take of all the extra settings for hot and warm buckets and just use homepath coldpath and frozenTimePeriodInSecs so that Splunk rolls the buckets itself and thus retention policy works appropriately.

homePath = volume:hotwarm/XXXXXX/db
coldPath = volume:cold/XXXXXX/colddb
thawedPath = $SPLUNK_DB/XXXXXX/thaweddb
homePath.maxDataSizeMB = 10000
coldPath.maxDataSizeMB = 356640
frozenTimePeriodInSecs = 5005600
maxTotalDataSizeMB = 1766400
maxDataSize = auto_high_volume

Suppose I have used the above settings for indexes.conf .. and the data is always less than maxTotalDataSizeMB .. is there a way to tell how long will the data stay searchable if it arrives today .

0 Karma

somesoni2
Revered Legend

I believe you should set maxHotSpanSecs (default 90 days) so something smaller, say 7 day and setup maxWarmDBCount smallar so that , for your retention period of 60 days, there will be enough warm buckets rolled. (you'd need to look at current number of warm buckets using the dbinspect command, where column state will give you bucket state hot/warm/cold)

| dbinspect index=YourIndexNameHere earliest=0 | table index bucketId startEpoch endEpoch state | convert ctime(*Epoch) as *HumanReadable
0 Karma

somesoni2
Revered Legend

You can go with default Splunk setting, except for maxDataSize attribute which should be set based on amount of data you'll be ingesting in that index. (set it to auto for small ingestions, <750MB/day, else, set it to auto_high_volume). I think part of your problem is timestamp in your data. You might be ingesting a data whose timestamp might be varying a lot (generally ingested data is more close to current time, but you might have old historic data, that's why you see such a big range of timestamp of events in a bucket. Generally the difference is few days, but in your case is upto 2 years.

0 Karma

nawazns5038
Builder

Ya .. having a hard time finding out how long will the data be retained in Splunk.

Is there a straight forward way to say how long will the data be available if the data comes in today and above settings or any other settings are in use ?

Any calculations that can be done to say the last date of availability ?

0 Karma

somesoni2
Revered Legend

With current settings you can't. The reason is that currently there is no fixed setting applied to know when hot bucket will be rolled to warm (and how many). If you apply some configuration say, maxHotSpanSecs=1 day (86400), you'll know that everyday all hot buckets (default count of 3) will roll to warm stage for sure (they can roll before as well based on size or whether splunk restarts), so you know know how many bucket, so daily 3 buckets rolling to warm. If your maxWarmDBCount =180 (default is 300), then you'd know that bucket written into today will roll to cold bucket on 61st day (so data will be available for 61 days).

0 Karma

nawazns5038
Builder

I am able to see both new data and old data if i do a search on the index. How does retention apply here ?

0 Karma

elliotproebstel
Champion

Here's a quick way to see the earliest and latest events in the index main:

| metadata type=sources index=main 
| stats min(firstTime) AS firstTime max(lastTime) AS lastTime 
| convert ctime(firstTime) ctime(lastTime)

I believe retention policy depends on _time, but I'm having trouble finding documentation to back that up.

elliotproebstel
Champion

Best I can find regarding _time vs _indextime for this:

vix.output.buckets.older.than =
* Buckets must be this old before they will be archived.
* A bucket's age is determined by the the earliest _time field of any event in the bucket.

Source: http://docs.splunk.com/Documentation/Splunk/7.0.1/Admin/Indexesconf

I can't say for sure if that's limited to virtual indexes, though - since that setting certainly is.

0 Karma

nawazns5038
Builder

Its a bit unclear .. so if a bucket contains both two year old data and todays data ... how will the bucket roll to frozen ?

0 Karma

nawazns5038
Builder

I have an index which has data from two years old and even latest data. how is retention policy applied to that data , that index has a retention of 60 days.

0 Karma

elliotproebstel
Champion

Retention policy applies to the newest event in a bucket. A bucket will not be retired until every event in that bucket is older than the retention policy date. If a given index doesn't receive a very high volume of data, then it won't retire buckets very often.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...