Getting Data In

Splunk Indexes.conf retention

Prakash493
Communicator

Hi, I am currently setting up retention for an index, I want a retention period of 1 year where after that period the data will be deleted.

But worried about the data moving from hot/warm to cold so that my retention will be enforced.
My daily data ingestion to this index is 50 MB, Decided to add maxHotSpanSecs- which is for 24 hours after that period data will roll to warm bucket.
As my data ingestion is low, I want to add stanza maxWarmDBCount = 10, assuming by specifying 10 in it- it means each bucket on a 64-bit system can take up to 10gb of space, 10 buckets means will take around 100GB then data will roll to cold.
So 50 MB/day ingestion it will take around 200 days to fill one bucket?

Please give me some clear concept on it, How can I set it up :

[xxxxxxx]

 homePath = $SPLUNK_DB/xxxxxxxxxx/db

 ColdPath = $SPLUNK_DB/xxxxxxxxxx/colddb

 thawedPath = $SPLUNK_DB/xxxxxxx/thaweddb

 repFactor = auto

 maxHotSpanSecs = 86400

 maxWarmDBCount  = 10

 frozenTimePeriodInSecs = 31104000
0 Karma
1 Solution

jnudell_2
Builder

Hi Prakash493,
The default bucket size, unless otherwise specified, is 750MB, but that doesn't matter in your scenario.

You're specifying that you only want 10 buckets of warm data, after which point it will roll to cold.

In your case, that's going to be 10 days of warm data (you're setting your hot bucket size to a maximum of 86400 seconds in terms of age, and it's not going to hit the size limit of 750MB in one day), because everytime a hot bucket is rolled to warm (every day = 1 bucket), warm will check to see if there are more than 10 buckets and roll the oldest to make room for the new warm bucket. After that, the index will keep the cold bucket for 31104000 seconds (360 days). Once a cold bucket is older than 360 days, it will be deleted.
So, you'll end up with 1 hot bucket, 10 warm buckets and 349 cold buckets because your buckets will all contain only 1 days worth of data due to the maxHotSpanSecs setting of 86400.

View solution in original post

0 Karma

ddrillic
Ultra Champion

Truly, you need to worry about the frozenTimePeriodInSecs = 31536000 - 60 * 60 * 24 * 365 and
maxTotalDataSizeMB = 20000 - 50 * 365 + some space. Since maxTotalDataSizeMB has a default of 1/2 TB you don't need to set it up, considering your volume.

0 Karma

Prakash493
Communicator

Hi i have a global settings for maxTotalDataSizeMB=auto_high_volume , i will set up for frozentimeperiodinsecs which is for 365 days but issue us most of the data is not be able to roll to cold so it can delete it , for 365 retention and how can i determine that data is rolling to cold for certain period of time ?i am thinking to add MaxWarmDBCount so data will roll to cold faster , how can i make the capacity planning to make sure data is rolling to cold ?

50MB daily ingestion and have a global settings of max data size auto high volume ?

0 Karma

jnudell_2
Builder

Hi Prakash493,
The default bucket size, unless otherwise specified, is 750MB, but that doesn't matter in your scenario.

You're specifying that you only want 10 buckets of warm data, after which point it will roll to cold.

In your case, that's going to be 10 days of warm data (you're setting your hot bucket size to a maximum of 86400 seconds in terms of age, and it's not going to hit the size limit of 750MB in one day), because everytime a hot bucket is rolled to warm (every day = 1 bucket), warm will check to see if there are more than 10 buckets and roll the oldest to make room for the new warm bucket. After that, the index will keep the cold bucket for 31104000 seconds (360 days). Once a cold bucket is older than 360 days, it will be deleted.
So, you'll end up with 1 hot bucket, 10 warm buckets and 349 cold buckets because your buckets will all contain only 1 days worth of data due to the maxHotSpanSecs setting of 86400.

0 Karma

Prakash493
Communicator

Hi i have a global settings for maxTotalDataSizeMB=auto_high_volume and i am not adding machotspanin secs attribute , but will keep maxwarmdbcount to 10 means , each bucket can hold upto 10gb of data ?

Please let me know the maxwarmdbcount stanza , some answers i see they are saying it says about number of buckets not number of days as you mentioned ?

0 Karma

jnudell_2
Builder

maxWarmDBCount is number of buckets as per the Splunk documentation:

maxWarmDBCount = [Non-negative Integer]
* The maximum number of warm buckets.
* Warm buckets are located in the for the index.
* If set to zero, Splunk will not retain any warm buckets
(will roll them to cold as soon as it can)
* Highest legal value is 4294967295
* Defaults to 300.


Therefore, your original thought would be valid, where you have a 200 day bucket for 50MB / day in a 10GB bucket.

If you want to have something span a time period instead of the size of the buckets, you have to use maxHotSpanSecs. That is the only way to limit a bucket by time range.

0 Karma

Prakash493
Communicator

Awesome Thank you so much for detailed explanation really appreciate your time.

0 Karma

jnudell_2
Builder

Hi,
If you're NOT using maxHotSpanSecs AND maxDataSize = auto_high_volume (not maxTotalDataSizeMB, which relates to the total size of an index. This defaults to 500GB.), then yes, the hot buckets will wait to fill up to 10GB (~200 days of your 50MB / day), unless you restart Splunk (at which point, all hot buckets get rolled to warm) frequently. This is probably not the behavior you want, because you'll have cold buckets that can stay almost 200 days past the archive date, and they will be 10GB in size.

As for maxWarmDbCount, according to the latest .spec documentation:

maxWarmDBCount = 
* The maximum number of warm buckets.
* Warm buckets are located in the for the index.
* If set to zero, Splunk will not retain any warm buckets
(will roll them to cold as soon as it can)
* Highest legal value is 4294967295
* Defaults to 300.

This DEFINITELY means the number of buckets, and not the number of days.
In your configuration, it means you'll have at most 10 buckets in warm that are 10GB in size (or about 2,000 days in length). This means these buckets will not get rolled to cold, because for this index you have 100GB of bucket space for a 50MB / day data source. The buckets will only get removed after the frozenTimePeriodInSecs limit is reach for the MOST RECENT EVENT in that bucket.
Therefore, if you don't use maxHotSpanSecs in your configuration your data retention for your index will look like this:
3 hot buckets for 90 days (default maxHotSpanSecs is 90 days, default hot buckets is 3)
roll to 3 warm buckets
3 hot bucket for 90 days + 3 warm bucket for 180 days
roll to 3 warm buckets
3 hot buckets for 90 days + 3 warm buckets for 180 days + 3 warm buckets for 270 days
roll to 3 warm buckets
3 hot buckets for 90 days + 3 warm buckets for 180 days + 3 warm buckets for 270 days + 3 warm buckets for 360 days
roll to 3 warm buckets & delete 3 warm buckets
3 hot buckets for 90 days + 3 warm buckets for 180 days + 3 warm buckets for 270 days + 3 warm buckets for 360 days

This is just an estimate of what will happen, because you'll have reboots and other things that roll hot buckets before they reach the 90 day limit.

Ultimately, I'm not sure that this is the behavior you want for your bucket strategy.

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...