We have a requirement to have a 5 year data retention. Unfortunately, we discovered that Splunk has not been configured for this and we've been losing data. We would like to setup buckets for data retention. They would like to be able to search at least 1 year back, perhaps a little more and then archive it for auditing purposes.
500GB seems to be about 10-11 months of data. I'm a bit new to setting up the buckets so I'm wondering the best way to set this up. Do we roll every month or so to warm buckets and then from there to cold or frozen? What would a configuration like that look like? Our indexes.conf has nothing but the max frozen time but nothing is being frozen yet so that doesn't really help.
Hi @Hiattech
If you want to retain the data as searchable data for 12 months then you will need to set the frozenTimePeriodInSecs to 31536000 (365*24*60*60) for the indexes you need to keep in your indexes.conf - how you deploy this may depend on your setup. There may also be other settings to think about such as maxTotalDataSizeMB which default to 500gb and may be a little small for 12 months data. It also depend on other factors such as if you are using an indexer cluster and the specifics of your Search/Replication factor.
In terms of what happens to the data after 12 months, the data will be "frozen" / archived and this is your opportunity to move it elsewhere for compliance/audit. This is where the coldToFrozenScript or coldToFrozenDir setting comes in handy, this allows you to move the frozen data to a dedicated location.
coldToFrozenScript = <path to script interpreter> <path to script>
* Specifies a script to run when data is to leave the splunk index system.
* Essentially, this implements any archival tasks before the data is
deleted out of its default location.
* Add "$DIR" (including quotes) to this setting on Windows (see below
for details).
coldToFrozenDir = <path to frozen archive>
* An alternative to a 'coldToFrozen' script - this setting lets you
specify a destination path for the frozen archive.
* Splunk software automatically puts frozen buckets in this directory
* For information on how buckets created by different versions are
handled, see "Freezing and Thawing" below.
* If both 'coldToFrozenDir' and 'coldToFrozenScript' are specified,
'coldToFrozenDir' takes precedence
* You must restart splunkd after changing this setting. Reloading the
configuration does not suffice.
For more info see https://help.splunk.com/en/splunk-enterprise/administer/admin-manual/9.4/configuration-file-referenc...
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
While looking into the various settings, I discovered that the default max frozen time is set to 500gb but because we're hitting the 500gb within a year, it's just deleting the data. I thought it would move the data to warm or even cold buckets and then the data would fall off after 5 years. is this not the case? Is it straight up limited to 500gb per index and then just deletes it?
Hi @Hiattech ,
as I said, 500 GB is the default max dimension of an index, if you set the maxTotalDataSizeMB to the dimension you need (based on the Capacity Planning) you can index all the logs you want in a year (obviously based on your license and the capacity of your storage!).
As I said, you have to define the max dimension of your index in 5 years and setup both maxTotalDataSizeMB and frozenTimePeriodInSecs to maintain all your data for 5 years.
Ciao.
Giuseppe
Got it. Sorry, I guess that didn't click at first. I'll update those settings for the 2 indexes in question. The other indexes seems to be well within the size/age limit.
Hi @Hiattech ,
good for you, see next time!
let us know if we can help you more, or, please, accept one answer for the other people of Community.
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the contributors 😉
As others already mentioned, the rolling itself hot->warm->cold(->frozen) happens automatically based on configured parameters.
Hot->warm and warm->cold rolling process is mostly happening due to performance reasons. You might want to tune parameters here but they don't affect the overall retention for the index. It's the cold->frozen rolling which is mostly controlled by
maxTotalDataSizeMB
and
frozenTimePeriodInSecs
Splunks rolls data to frozen (by default it just deletes it if freezing process is not defined) when either the index exceeds the defined size or bucket's latest event is older than the retention period.
There are additional constraints if you have your data organized into volumes. And Smartstore adds to the complexity but I suppose you don't have either.
Hi @Hiattech ,
by default, Splunk indexes have a retention of 6 years and a dimension of 500 GB.
If you need a retention of 5 years, you should do a capacity plan to understand how many logs you index daily in each index and then, moltiplying for 365 and for 5, you will have the required dimension of your index.
having this, you should configure for each index two parameters
so, if you have an average of daily ingestion of 100 GB, you want to setup a 10% margin and you want a retention of 5 years, you should setup:
Then you should define if you need to have 5 years logs searchable or it's possible to have a subset of this time for searchable logs and then save the other logs to manually resume them if required.
In this case, you could setup a shorter period in frozenTimePeriodInSecs and define a script to save buckets for the frozen period.
Ciao.
Giuseppe
@Hiattech
You don’t manually “roll” buckets each month — bucket movement (hot → warm → cold → frozen) is automatic, based on size, age, and retention settings in indexes.conf.
Sample config you can adapt,
[my_index]
homePath = $SPLUNK_DB/my_index/db
coldPath = /mnt/slowstorage/my_index/colddb
thawedPath = $SPLUNK_DB/my_index/thaweddb
# Keep data searchable for ~1 year
frozenTimePeriodInSecs = 31536000
# Archive frozen buckets instead of deleting
coldToFrozenDir = /mnt/archive/my_index/frozendb
# OR use a script:
# coldToFrozenScript = $SPLUNK_HOME/bin/scripts/archive_to_s3.sh
Regards,
Prewin
If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!
Hi @Hiattech
If you want to retain the data as searchable data for 12 months then you will need to set the frozenTimePeriodInSecs to 31536000 (365*24*60*60) for the indexes you need to keep in your indexes.conf - how you deploy this may depend on your setup. There may also be other settings to think about such as maxTotalDataSizeMB which default to 500gb and may be a little small for 12 months data. It also depend on other factors such as if you are using an indexer cluster and the specifics of your Search/Replication factor.
In terms of what happens to the data after 12 months, the data will be "frozen" / archived and this is your opportunity to move it elsewhere for compliance/audit. This is where the coldToFrozenScript or coldToFrozenDir setting comes in handy, this allows you to move the frozen data to a dedicated location.
coldToFrozenScript = <path to script interpreter> <path to script>
* Specifies a script to run when data is to leave the splunk index system.
* Essentially, this implements any archival tasks before the data is
deleted out of its default location.
* Add "$DIR" (including quotes) to this setting on Windows (see below
for details).
coldToFrozenDir = <path to frozen archive>
* An alternative to a 'coldToFrozen' script - this setting lets you
specify a destination path for the frozen archive.
* Splunk software automatically puts frozen buckets in this directory
* For information on how buckets created by different versions are
handled, see "Freezing and Thawing" below.
* If both 'coldToFrozenDir' and 'coldToFrozenScript' are specified,
'coldToFrozenDir' takes precedence
* You must restart splunkd after changing this setting. Reloading the
configuration does not suffice.
For more info see https://help.splunk.com/en/splunk-enterprise/administer/admin-manual/9.4/configuration-file-referenc...
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing