Splunk Search

Are there any search and performance pitfalls with keeping data in hot buckets for 1 month and moving it from hot to cold directly?

KomalSharma
Explorer

I have gone through the documentation and want to check if a scenario like this will work out:
-Hold 1 months data in hot buckets (maxHotSpanSecs=2628000)
-Move data from hot to cold directly (maxWarmDBCount=0, frozenTimePeriodInSecs=31536000)

The colddb is a different/ slower storage.
Are there any pitfalls from taking this approach in term of search and performance results?

Thanks,
Komal

1 Solution

lguinn2
Legend

I would not keep a month of data in hot buckets. Warm buckets and hot buckets can be both be searched very quickly. But hot buckets are open for writing and warm buckets are not. Therefore warm buckets can be backed up and are less vulnerable to corruption.

Plus, Splunk sometimes reorganizes hot buckets to optimize search. I would never set maxHotBuckets=1, unless you can guarantee that your data arrives in strict time sequence. And really, I still wouldn't do it. Also, you do not want your hot buckets to get too large, as this also impacts search speed.

I would set the bucket size to approximately 1 day's worth of data, but not smaller than 750MB or larger than 10GB. Let's face it, Splunk engineering has years of experience in understanding the range of optimum bucket sizes.
Setting

maxWarmDBCount = 31

means that you will have at most 31 warm buckets, plus your hot buckets, for approximately a month of data in hot/warm. I would also leave maxHotBuckets at the default setting of 3. I see no need to set maxHotSpanSecs. If you are concerned about disk space usage for hot/warm buckets, set

homePath.maxDataSizeMB = XYZ

where XZY is the maximum amount of disk space in MB that you want the hot/warm buckets to use. Splunk will never let hot/warm use more than this, even if that means that you end up with fewer than 31 warm buckets...

Finally, look at the settings for the default index, main. It comes configured to manage a fairly high volume of incoming data. I would start with the same settings for my index and then tune (like the maxWarmDBCount and homePath.maxDataSizeMB) for the particular situation. Again, leverage the experience of Splunk engineering - they figured out these defaults!

View solution in original post

lguinn2
Legend

I would not keep a month of data in hot buckets. Warm buckets and hot buckets can be both be searched very quickly. But hot buckets are open for writing and warm buckets are not. Therefore warm buckets can be backed up and are less vulnerable to corruption.

Plus, Splunk sometimes reorganizes hot buckets to optimize search. I would never set maxHotBuckets=1, unless you can guarantee that your data arrives in strict time sequence. And really, I still wouldn't do it. Also, you do not want your hot buckets to get too large, as this also impacts search speed.

I would set the bucket size to approximately 1 day's worth of data, but not smaller than 750MB or larger than 10GB. Let's face it, Splunk engineering has years of experience in understanding the range of optimum bucket sizes.
Setting

maxWarmDBCount = 31

means that you will have at most 31 warm buckets, plus your hot buckets, for approximately a month of data in hot/warm. I would also leave maxHotBuckets at the default setting of 3. I see no need to set maxHotSpanSecs. If you are concerned about disk space usage for hot/warm buckets, set

homePath.maxDataSizeMB = XYZ

where XZY is the maximum amount of disk space in MB that you want the hot/warm buckets to use. Splunk will never let hot/warm use more than this, even if that means that you end up with fewer than 31 warm buckets...

Finally, look at the settings for the default index, main. It comes configured to manage a fairly high volume of incoming data. I would start with the same settings for my index and then tune (like the maxWarmDBCount and homePath.maxDataSizeMB) for the particular situation. Again, leverage the experience of Splunk engineering - they figured out these defaults!

damode
Motivator

Hi @lguinn [Splunk],

Based on your points and if my requirement is 30 day ACTIVE and 90 day COLD storage with same storage for hot,warm & cold, while assuming Avg indexed data per day for main index=1Gb, would the following settings be right ?
maxDataSize = 1000
maxHotBuckets= 3
maxWarmDBCount = 31
homePath.maxDataSizeMB = 32000 (data size equivalent of 30 days + extra)
coldPath.maxDataSizeMB = 90000 (data size equivalent of 90 days)
maxTotalDataSizeMB = 122000
frozenTimePeriodInSecs = 10368000

Thanks,
Dev

0 Karma

sowings_splunk
Splunk Employee
Splunk Employee

+1!

Don't use a count-based warm bucket sizing rule; there are lots of reasons why hot buckets might roll to warm before they reach their max size. Instead, use @lguinn's suggestion of a size limit on the homePath.

Also, please understand that hot buckets aren't magical. There's nothing about a hot bucket that makes it any different to search than a warm bucket--the former is open for writing, that's all. The only difference between warm and cold is the partition on which they're stored (which may lead to different search performance, but in the common case of "just one big partition for all Splunk data", it does not).

somesoni2
Revered Legend

What is the amount of data that you're expecting to be present in Hot Bucket? If suppose you expected data volume is 10 GB in 1 month, then you can try with this setting

[yourIndex]
maxDataSize = auto_high_volume
maxHotSpanSecs = 2628000
maxHotBuckets = 1
maxWarmDBCount = 0
frozenTimePeriodInSecs = 31536000

If the volume is more than 10 GB then you can increase the maxHotBuckets values.

If the volume is much lower like 1-2 GB, you can use " maxDataSize = auto" (750 MB volume for hot bucket) and adjust maxHotBuckets to accommodate your max volume.

0 Karma

somesoni2
Revered Legend

There should be a performance impact when searching the historical data (data older than 1 month).

0 Karma

KomalSharma
Explorer

Agree, delay for data older than 1 month is fine. Any impact on the same month's indexing/searching ? I guess maxHotBuckets should also be increased to maybe 5 for other hot to warm rolling scenarios.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...