Deployment Architecture

How to deal with bucket sizes and retention policy

chaseatul
Explorer

I am fairly limited knowledge on buckets. I have one single storage drive available to me of 2 TB. And the retention policy is 18 months. The Splunk indexer is capturing the Windows logs and data volume is roughly 2-3 Gb per day.

Can I have only Hot and Cold bucket, or do we need Warm bucket as well?

Can I have Hot, Warm and Cold buckets in the same data partition?

Where do I define the folder for buckets (Ex: Z:/...../Hot, Z:/...../Warm, Z:/...../Cold) ?

What is the time duration for which data will remain in hot bucket and Warm Bucket before it gets rolled off (Can I define 1 Week for Hot, 90 Days for Warm and 18 Months for Cold, after that it can be purged) ?

What all entries I need to make in indexes.conf to achieve this so that my data doesn’t get deleted before the retention date?

Can we backup the hot bucket too without taking them offline?

Tags (2)
1 Solution

kristian_kolb
Ultra Champion

From a storage perspective, you should treat hot and warm the same. They must exist in the same path, and the only difference is that hot buckets are actively being written to. In your case, as you just have a single drive, there is no real difference between hot/warm and cold either.

The definition of the paths for hot/warm, cold and frozen for an index can be done in the GUI, or in indexes.conf.

As for the retention time, it is controlled for the index as a whole, i.e. hot+warm+cold. This can only be defined in indexes.conf.

What you probably need is to set the retention time to 18 months via the following setting, which is equivalent to just under 550 days (~18 months);

frozenTimePeriodInSecs = 47500000

Additionally, you'd like to set the total size to a higer value than the default 500GB;

maxTotalDataSizeMB = 1900000

Assuming that you index 3 GB of raw logs per day, and with an average compression ratio of 50%, your 2TB drive will last you more than 3 years.

As for backing up, you should read the following;
http://docs.splunk.com/Documentation/Splunk/6.0.1/Indexer/Backupindexeddata

Hope this helps,

/K

View solution in original post

kristian_kolb
Ultra Champion

From a storage perspective, you should treat hot and warm the same. They must exist in the same path, and the only difference is that hot buckets are actively being written to. In your case, as you just have a single drive, there is no real difference between hot/warm and cold either.

The definition of the paths for hot/warm, cold and frozen for an index can be done in the GUI, or in indexes.conf.

As for the retention time, it is controlled for the index as a whole, i.e. hot+warm+cold. This can only be defined in indexes.conf.

What you probably need is to set the retention time to 18 months via the following setting, which is equivalent to just under 550 days (~18 months);

frozenTimePeriodInSecs = 47500000

Additionally, you'd like to set the total size to a higer value than the default 500GB;

maxTotalDataSizeMB = 1900000

Assuming that you index 3 GB of raw logs per day, and with an average compression ratio of 50%, your 2TB drive will last you more than 3 years.

As for backing up, you should read the following;
http://docs.splunk.com/Documentation/Splunk/6.0.1/Indexer/Backupindexeddata

Hope this helps,

/K

View solution in original post

kristian_kolb
Ultra Champion

Yes, start with these settings. You can always alter stuff like maxDataSize, maxHotIdleSecs and maxWarmDBCount later without really changing anything major.

Once in production, changing (lowering) maxTotalDataSizeMB or frozenTimePeriodInSecs by too much, may cause Splunk to start deleting data (oldest first). You must understand the implications of such changes before you perform them.

Best of luck,

K

0 Karma

chaseatul
Explorer

So will it be right to say the indexer.conf should have the below settings to meet my requirements as per policy assuming the storage drives are in different disk groups?

[main]
homePath = SANDISK1:splunkmyindexdb
coldPath = SANDISK2:splunkmyindexcolddb
maxTotalDataSizeMB = 1900000 (Maximum size of the index to 2TB)
maxDataSize = 2048 (bucket size is to be 2GB)
maxHotIdleSecs = 86400 (1 day, for hot to warm roll)
maxWarmDBCount = 30 (30 buckets = 30days, for warm to cold)
frozenTimePeriodInSecs = 47500000 (Keep the data online in hot,warm,cold bucket for 18 months and then delete it)

0 Karma

kristian_kolb
Ultra Champion

As I said before - if you have a requirement of 18 months worth of online data, set it - for the index as a whole - via the frozenTimePeriodInSecs parameter.

If the intended storage for hot/warm and cold is the same, there is no point (from a technical/functional perspective) to strive for "no warm buckets". With understanding of how hot/warm/cold/frozen buckets work, you can explain to the policy-maker why warm/cold does not matter.

If you really insist, there is an additional parameter controlling the maximum number of warm buckets in an index.

maxWarmDBCount = <positive integer>

/K

0 Karma

chaseatul
Explorer

My policy says we need 18 Months of data which is readily available. 30 days of data in hot bucket and 18 months in cold bucket. Is there a easy way to move from hot to cold bucket directally and not have any warm bucket at all. Else I will use 1 day for hot bucket, 30 days for warm and 18 months for cold.
Can you please advise on both the scenarios.

0 Karma

kristian_kolb
Ultra Champion

You are aware that cold buckets are still online and searchable?

Does your policy say that you need to have 30 days worth of log data online, and 18 months 'readily available'?

In that case, you may be thinking of frozen rather than cold, i.e. where you set up a path for archiving the data that should be removed from the online system, but it must still be possible to re-import it if the need arises.

If your policy actually says that you need 30 days warm and 18 months cold, and you use the same underlying storage - you have a case for challenging the policy.

/K

0 Karma

chaseatul
Explorer

As per the policy , I need the warm bucket to be of 30 days and rest data should be in cold bucket. As the data that flows might vary, how can I make sure that 1 day of data should be in Hot bucket, 30 days in Warm bucket and 18 months on cold bucket, after that it can be deleted.

What settings would help me achieve this?

0 Karma

kristian_kolb
Ultra Champion

I guess that you could remove the explicit settings for maxDataSize, maxHotIdleSecs and maxWarmDBCount and just go with the default values.

There is no particular point in trying to gear your storage so that 1 bucket = 1 day. Also, if your warm and cold data resides on the same disk ( in terms of speed/cost) there is little need to configure anything specifically. Hot data will not be searched faster or treated differently than cold data by design, it's only the underlying storage speed that affects performance.

chaseatul
Explorer

[main]
homePath = SANDISK:\splunk\myindex\db
coldPath = SANDISK:\splunk\myindex\colddb
maxTotalDataSizeMB = 1900000 (Maximum size of the index to 2TB)
maxDataSize = 2048 (bucket size is to be 2GB) is it Ok ?
maxHotIdleSecs = 86400 (1 day, for hot to warm roll) do I need this line?
maxWarmDBCount = 30 (30 buckets = 30days, for warm to cold)
frozenTimePeriodInSecs = 47500000 (550 days in sec, keep the data online in hot,warm,cold bucket for 18 months and then delete it)

If I define this will it be Ok?..or should I take off some lines as they are not applicable in my case or add few ?

.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!