Hi,
We have been using Splunk for a couple of years and to build up our retention policy, we created a report which was scheduled every night.
The report was executed by a special user account that will only be used to schedule this report. This user has the permission to delete events from Splunk.
The search string looks like this:
index=* NOT (index=_* OR index=history OR index=main OR index=os OR index=splunklogger OR index=summary) latest=-180d@ | delete
So far we have no problems with this solution. You have to know that we have more than 200 indexes defined on our indexer and it is very important that there are no events in there which are older than 180 days.
I'd like to discuss this solution with you. What do you think about it? Is this a proper way to delete all events with a specific Age?
Thanks for ideas and answers.
If you set these two settings in indexes.conf
for each index
maxHotSpanSecs = 86400
frozenTimePeriodInSecs = 15552000
Each bucket will contact exactly 1 day's data, and buckets will roll at midnight. The frozenTimePeriod
in seconds will roll the buckets after 180 days. This combination of settings will guarantee that there is no data in the index older than 180 days.
This will solve the problem mentioned by @MuS, where a bucket could contain data from different days.
Your solution does not recover the disk space and is not the best practice. (Although eventually it will recover the disk space, as the buckets finally age.) Also, if the script fails to run for any reason, you will have excess data in your indexes. If you set the parameters in indexes.conf
, even if Splunk has been down, when it starts up again, it will immediately follow the indexes.conf
policy and age out the data over 180 days.
And, using frozenTimePeriodInSecs
allows you to set different retentions for different indexes. At some future point, you may want to do this.
FYI, please remember that Splunk will never consume more disk space than is allocated for an index. So it is possible that you could have an index with fewer than 180 days of data if insufficient disk space is allocated for the events. So be sure to check the index size, too - this is also set in indexes.conf as maxTotalDataSizeMB
If you set these two settings in indexes.conf
for each index
maxHotSpanSecs = 86400
frozenTimePeriodInSecs = 15552000
Each bucket will contact exactly 1 day's data, and buckets will roll at midnight. The frozenTimePeriod
in seconds will roll the buckets after 180 days. This combination of settings will guarantee that there is no data in the index older than 180 days.
This will solve the problem mentioned by @MuS, where a bucket could contain data from different days.
Your solution does not recover the disk space and is not the best practice. (Although eventually it will recover the disk space, as the buckets finally age.) Also, if the script fails to run for any reason, you will have excess data in your indexes. If you set the parameters in indexes.conf
, even if Splunk has been down, when it starts up again, it will immediately follow the indexes.conf
policy and age out the data over 180 days.
And, using frozenTimePeriodInSecs
allows you to set different retentions for different indexes. At some future point, you may want to do this.
FYI, please remember that Splunk will never consume more disk space than is allocated for an index. So it is possible that you could have an index with fewer than 180 days of data if insufficient disk space is allocated for the events. So be sure to check the index size, too - this is also set in indexes.conf as maxTotalDataSizeMB
@Iguinn, thanks for your answer. That's exactly what I'm searching for.
Unfortunately I didn't completly understood the manual with these settings in the indexes.conf section.
One short question about the maxTotalDataSizeMB
setting. If I set it to auto
, I should be on the safe side? So the index can grow like it wants and with the other two settings you described, I should be safe that each day the bucket will be rolled.
My problem is, that the amount of data input for the differnt indexes are not the same each day. So I could count how many MB is right for the indexes. In that case it makes sense to me to set the value to auto.
For your information, the disk space for the indexer is big enough, so we should not get trouble with it.
Kind regards, and once again thanks for your reply.
No, auto
is only used for the size of a single bucket - not the size of the index overall. You must set an actual value for maxTotalDataSizeMB
; if you don't, the default size is 500000 (500GB). You will need to monitor your indexes to make sure that they don't exceed their maximum size allocation.
For the size of a bucket, use maxDataSize
. If set to auto
, then the maximum size of a single bucket will be 750MB. The auto_high_volume
setting is 10GB. I suggest that you set this to a size that approximates the amount (on disk) of data that is added to the index each day, or less. However I would never set a bucket size lower than 750MB.
why can't you set frozenTimePeriodInSecs in indexes.conf for each index? just curious.
Because with frozenTimePeriodInSecs
you can have older events in your buckets, because Every event in the DB must be older than frozenTimePeriodInSecs before it will roll
TIL. Thanks.
@MuS - respectfully disagree, because you can set maxHotSpanSecs
to overcome that problem.
@lguinn, no problem at all and thanks for my TIL as well because your answer and the combination of the two options is brilliant !