Reporting

What do other users think of our retention policy solution using a nightly scheduled report to search and delete events older than 180 days?

krusty
Contributor

Hi,

We have been using Splunk for a couple of years and to build up our retention policy, we created a report which was scheduled every night.
The report was executed by a special user account that will only be used to schedule this report. This user has the permission to delete events from Splunk.

The search string looks like this:

index=* NOT (index=_* OR index=history OR index=main OR index=os OR index=splunklogger OR index=summary) latest=-180d@ | delete

So far we have no problems with this solution. You have to know that we have more than 200 indexes defined on our indexer and it is very important that there are no events in there which are older than 180 days.

I'd like to discuss this solution with you. What do you think about it? Is this a proper way to delete all events with a specific Age?

Thanks for ideas and answers.

1 Solution

lguinn2
Legend

If you set these two settings in indexes.conf for each index

maxHotSpanSecs = 86400
frozenTimePeriodInSecs = 15552000

Each bucket will contact exactly 1 day's data, and buckets will roll at midnight. The frozenTimePeriod in seconds will roll the buckets after 180 days. This combination of settings will guarantee that there is no data in the index older than 180 days.

This will solve the problem mentioned by @MuS, where a bucket could contain data from different days.

Your solution does not recover the disk space and is not the best practice. (Although eventually it will recover the disk space, as the buckets finally age.) Also, if the script fails to run for any reason, you will have excess data in your indexes. If you set the parameters in indexes.conf, even if Splunk has been down, when it starts up again, it will immediately follow the indexes.conf policy and age out the data over 180 days.

And, using frozenTimePeriodInSecs allows you to set different retentions for different indexes. At some future point, you may want to do this.

FYI, please remember that Splunk will never consume more disk space than is allocated for an index. So it is possible that you could have an index with fewer than 180 days of data if insufficient disk space is allocated for the events. So be sure to check the index size, too - this is also set in indexes.conf as maxTotalDataSizeMB

View solution in original post

lguinn2
Legend

If you set these two settings in indexes.conf for each index

maxHotSpanSecs = 86400
frozenTimePeriodInSecs = 15552000

Each bucket will contact exactly 1 day's data, and buckets will roll at midnight. The frozenTimePeriod in seconds will roll the buckets after 180 days. This combination of settings will guarantee that there is no data in the index older than 180 days.

This will solve the problem mentioned by @MuS, where a bucket could contain data from different days.

Your solution does not recover the disk space and is not the best practice. (Although eventually it will recover the disk space, as the buckets finally age.) Also, if the script fails to run for any reason, you will have excess data in your indexes. If you set the parameters in indexes.conf, even if Splunk has been down, when it starts up again, it will immediately follow the indexes.conf policy and age out the data over 180 days.

And, using frozenTimePeriodInSecs allows you to set different retentions for different indexes. At some future point, you may want to do this.

FYI, please remember that Splunk will never consume more disk space than is allocated for an index. So it is possible that you could have an index with fewer than 180 days of data if insufficient disk space is allocated for the events. So be sure to check the index size, too - this is also set in indexes.conf as maxTotalDataSizeMB

krusty
Contributor

@Iguinn, thanks for your answer. That's exactly what I'm searching for.
Unfortunately I didn't completly understood the manual with these settings in the indexes.conf section.

One short question about the maxTotalDataSizeMB setting. If I set it to auto, I should be on the safe side? So the index can grow like it wants and with the other two settings you described, I should be safe that each day the bucket will be rolled.
My problem is, that the amount of data input for the differnt indexes are not the same each day. So I could count how many MB is right for the indexes. In that case it makes sense to me to set the value to auto.
For your information, the disk space for the indexer is big enough, so we should not get trouble with it.

Kind regards, and once again thanks for your reply.

0 Karma

lguinn2
Legend

No, auto is only used for the size of a single bucket - not the size of the index overall. You must set an actual value for maxTotalDataSizeMB; if you don't, the default size is 500000 (500GB). You will need to monitor your indexes to make sure that they don't exceed their maximum size allocation.

For the size of a bucket, use maxDataSize. If set to auto, then the maximum size of a single bucket will be 750MB. The auto_high_volume setting is 10GB. I suggest that you set this to a size that approximates the amount (on disk) of data that is added to the index each day, or less. However I would never set a bucket size lower than 750MB.

0 Karma

sk314
Builder

why can't you set frozenTimePeriodInSecs in indexes.conf for each index? just curious.

0 Karma

MuS
SplunkTrust
SplunkTrust

Because with frozenTimePeriodInSecs you can have older events in your buckets, because Every event in the DB must be older than frozenTimePeriodInSecs before it will roll

sk314
Builder

TIL. Thanks.

0 Karma

lguinn2
Legend

@MuS - respectfully disagree, because you can set maxHotSpanSecs to overcome that problem.

0 Karma

MuS
SplunkTrust
SplunkTrust

@lguinn, no problem at all and thanks for my TIL as well because your answer and the combination of the two options is brilliant !

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...