Archive

How to tune index retirement?

Path Finder

My initial goal was to specify index life time, this configuration (indexes.conf) seemed to work:

[main]
frozenTimePeriodInSecs = 120
maxHotIdleSecs = 1728000

The only problem I had is that after restarting Splunk, all indexes are removed (independently of their age). With such indexes.conf I could observe only hot buckets before maxHotIdleSecs is reached. Is there a way to make indexes retire, without removing them after each Splunk restart?

Tags (1)
0 Karma
1 Solution

Legend

If the requirement is "based on the timestamp of the data, I want to keep only the last 20 days of data," then the setting should be

[yourindexname]
frozenTimePeriodInSecs = 1728000

because that is 60s * 60m * 24h * 20d = 1728000

And yes, I agree that the name "frozenTimePeriodInSecs" is confusing...

IN ADDITION, you must realize that Splunk will NEVER exceed the maximum size for an index. It will freeze the oldest bucket if needed to ensure that the size is not exceeded -- even if the oldest bucket is less than 20 days old! "Freezing" is the Splunk term for "retiring," if I understand you correctly.

So make sure that you have sufficient space in your index. Specify it with

maxTotalDataSizeMB = n

where n is the total size of index in MB. The default size is 500,000MB

I don't think that you should set maxHotIdleSecs at all. Unless you have a deep knowledge of Splunk, let it decide how to organize the buckets.

View solution in original post

Legend

If the requirement is "based on the timestamp of the data, I want to keep only the last 20 days of data," then the setting should be

[yourindexname]
frozenTimePeriodInSecs = 1728000

because that is 60s * 60m * 24h * 20d = 1728000

And yes, I agree that the name "frozenTimePeriodInSecs" is confusing...

IN ADDITION, you must realize that Splunk will NEVER exceed the maximum size for an index. It will freeze the oldest bucket if needed to ensure that the size is not exceeded -- even if the oldest bucket is less than 20 days old! "Freezing" is the Splunk term for "retiring," if I understand you correctly.

So make sure that you have sufficient space in your index. Specify it with

maxTotalDataSizeMB = n

where n is the total size of index in MB. The default size is 500,000MB

I don't think that you should set maxHotIdleSecs at all. Unless you have a deep knowledge of Splunk, let it decide how to organize the buckets.

View solution in original post

Legend

If you really wanted Splunk to retire data after 5 minutes, you would need a bucket that only holds 5 minutes worth of data. The default minimum bucket size is 750MB. As Gerald points out, it is a bucket that is frozen, not individual events. So you would have to set a bucket size that is equivalent to 5 minutes of data. BUT it is not efficient for Splunk to have very small buckets... so I think you ended up doing the right thing by managing your index by size.

0 Karma

Splunk Employee
Splunk Employee

it will almost never work with a value as low as 5 minutes. if you read the docs carefully, you will see that frozen is applied when all data in a bucket is older than the specified time. the settings are all followed, but only apply to an entire bucket at a time.

Path Finder

Thanks, at the end I still could not figure out how frozenTimePeriodInSecs works. Maybe I'm hitting some other threshold, but to test it with 5 minutes it did not quite worked the way I expected. So finally I just use maxTotalDataSizeMB.

0 Karma

Champion

Have a read through of;
http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention

and

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Setaretirementandarchivingpolicy

Its a pretty good explanation of the data retention methods you can use and how they work. Its likely that your hot events are never rolling to cold due to your configuration and so your indexes will be empty after a restart

UPDATE:

Ok, so to clarify. A bucket contains the event data. There can be multiple hot buckets per index. So indexes aren't placed in a bucket but rather the index is the collection of buckets.
As per;
http://docs.splunk.com/Documentation/Splunk/4.2.4/admin/Indexesconf

frozenTimePeriodInSecs allows you to set the time in seconds from the time that event data is indexed that it rolls to frozen. If you don't provide any scripted action for frozen buckets (e.g. a script to copy them to an archive location) then the buckets are simply deleted. This allows you to effectively specify a "lifetime" of how many seconds until indexed data is deleted.
You can specify this in MB as per the above link to indexes.conf but this obviously isn't as exact as in seconds and your MB usage can fluctuate.

Is there a specific reason you are looking at idle time? This could result in many smaller hot buckets which would create lots of file handles every time you do a search - which in turn could negatively affect performance.
Unless there is something specific you want to achieve and are only looking to set a data retention time for data to be deleted then stick to frozenTimePeriodInSecs

Splunk can handle the roll from hot to warm, warm to cold and then to frozen itself generally.
Finally, as per the documentation;

Splunk ages out data by buckets.
Specifically, when the most recent
data in a particular bucket reaches
the configured age, the entire bucket
is rolled.

So data will be deleted when the most recent data within the cold bucket reaches the frozen time.

Champion

Ok, I'll update my answer with more detail

0 Karma

Path Finder

I read those pages, but still having troubles understand how it works.
1. frozenTimePeriodInSecs starts counting time after indexes are placed in the warm bucket?
2. Is there a time counter to roll warm buckets to cold or frozen state? Like maxWarmIdleSecs.

0 Karma

Splunk Employee
Splunk Employee

could you please specify exactly what you mean by "retire". please state more clearly what you want to have happen to data in an index after what period of time.

0 Karma

Splunk Employee
Splunk Employee

You have said that any data older than 120 seconds may be deleted. I assume this is a mistake. It's not clear to me what exactly you actually want, or what you think maxHotIdleSecs does, but it is likely that you are misunderstanding its function.

Path Finder

Thanks for replies, I might really misunderstanding functionality here. My goal is to keep indexes for 20 days (or 10 days after indexing) and then delete permanently.
In my understanding MaxHotIdleSecs is time spent in hot bucket and frozenTimePeriodInSecs is time spent in cold before permanently removing indexes. I might also abusing term retire, what I meant is removing indexes after reaching certain age.

0 Karma

Ultra Champion

Probably a swap of the intended values, is my guess. However, the names of some parameters are less than obvious. This is one of them.

A better name should probably be something like "retentionAge" or "retireToFrozen", and should accept parameters like 1d, 4h, 2y or 3m

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!