Hi, I am currently setting up retention for an index, I want a retention period of 1 year where after that period the data will be deleted.
But worried about the data moving from hot/warm to cold so that my retention will be enforced.
My daily data ingestion to this index is 50 MB, Decided to add maxHotSpanSecs- which is for 24 hours after that period data will roll to warm bucket.
As my data ingestion is low, I want to add stanza maxWarmDBCount = 10
, assuming by specifying 10 in it- it means each bucket on a 64-bit system can take up to 10gb of space, 10 buckets means will take around 100GB then data will roll to cold.
So 50 MB/day ingestion it will take around 200 days to fill one bucket?
Please give me some clear concept on it, How can I set it up :
[xxxxxxx]
homePath = $SPLUNK_DB/xxxxxxxxxx/db
ColdPath = $SPLUNK_DB/xxxxxxxxxx/colddb
thawedPath = $SPLUNK_DB/xxxxxxx/thaweddb
repFactor = auto
maxHotSpanSecs = 86400
maxWarmDBCount = 10
frozenTimePeriodInSecs = 31104000
Hi Prakash493,
The default bucket size, unless otherwise specified, is 750MB, but that doesn't matter in your scenario.
You're specifying that you only want 10 buckets of warm data, after which point it will roll to cold.
In your case, that's going to be 10 days of warm data (you're setting your hot bucket size to a maximum of 86400 seconds in terms of age, and it's not going to hit the size limit of 750MB in one day), because everytime a hot bucket is rolled to warm (every day = 1 bucket), warm will check to see if there are more than 10 buckets and roll the oldest to make room for the new warm bucket. After that, the index will keep the cold bucket for 31104000 seconds (360 days). Once a cold bucket is older than 360 days, it will be deleted.
So, you'll end up with 1 hot bucket, 10 warm buckets and 349 cold buckets because your buckets will all contain only 1 days worth of data due to the maxHotSpanSecs setting of 86400.
Truly, you need to worry about the frozenTimePeriodInSecs = 31536000
- 60 * 60 * 24 * 365
and
maxTotalDataSizeMB = 20000
- 50 * 365 + some space
. Since maxTotalDataSizeMB
has a default of 1/2 TB you don't need to set it up, considering your volume.
Hi i have a global settings for maxTotalDataSizeMB=auto_high_volume , i will set up for frozentimeperiodinsecs which is for 365 days but issue us most of the data is not be able to roll to cold so it can delete it , for 365 retention and how can i determine that data is rolling to cold for certain period of time ?i am thinking to add MaxWarmDBCount so data will roll to cold faster , how can i make the capacity planning to make sure data is rolling to cold ?
50MB daily ingestion and have a global settings of max data size auto high volume ?
Hi Prakash493,
The default bucket size, unless otherwise specified, is 750MB, but that doesn't matter in your scenario.
You're specifying that you only want 10 buckets of warm data, after which point it will roll to cold.
In your case, that's going to be 10 days of warm data (you're setting your hot bucket size to a maximum of 86400 seconds in terms of age, and it's not going to hit the size limit of 750MB in one day), because everytime a hot bucket is rolled to warm (every day = 1 bucket), warm will check to see if there are more than 10 buckets and roll the oldest to make room for the new warm bucket. After that, the index will keep the cold bucket for 31104000 seconds (360 days). Once a cold bucket is older than 360 days, it will be deleted.
So, you'll end up with 1 hot bucket, 10 warm buckets and 349 cold buckets because your buckets will all contain only 1 days worth of data due to the maxHotSpanSecs setting of 86400.
Hi i have a global settings for maxTotalDataSizeMB=auto_high_volume and i am not adding machotspanin secs attribute , but will keep maxwarmdbcount to 10 means , each bucket can hold upto 10gb of data ?
Please let me know the maxwarmdbcount stanza , some answers i see they are saying it says about number of buckets not number of days as you mentioned ?
maxWarmDBCount is number of buckets as per the Splunk documentation:
maxWarmDBCount = [Non-negative Integer]
* The maximum number of warm buckets.
* Warm buckets are located in the for the index.
* If set to zero, Splunk will not retain any warm buckets
(will roll them to cold as soon as it can)
* Highest legal value is 4294967295
* Defaults to 300.
If you want to have something span a time period instead of the size of the buckets, you have to use maxHotSpanSecs
. That is the only way to limit a bucket by time range.
Awesome Thank you so much for detailed explanation really appreciate your time.
Hi,
If you're NOT using maxHotSpanSecs AND maxDataSize = auto_high_volume (not maxTotalDataSizeMB, which relates to the total size of an index. This defaults to 500GB.), then yes, the hot buckets will wait to fill up to 10GB (~200 days of your 50MB / day), unless you restart Splunk (at which point, all hot buckets get rolled to warm) frequently. This is probably not the behavior you want, because you'll have cold buckets that can stay almost 200 days past the archive date, and they will be 10GB in size.
As for maxWarmDbCount, according to the latest .spec documentation:
maxWarmDBCount =
* The maximum number of warm buckets.
* Warm buckets are located in the for the index.
* If set to zero, Splunk will not retain any warm buckets
(will roll them to cold as soon as it can)
* Highest legal value is 4294967295
* Defaults to 300.
This is just an estimate of what will happen, because you'll have reboots and other things that roll hot buckets before they reach the 90 day limit.
Ultimately, I'm not sure that this is the behavior you want for your bucket strategy.