Solved: What is the harm of using auto_high_volume on smal...

renems · ‎01-09-2017

Hi,

I found myself on a site where EVERY index is configured auto_high_volume. I'm aware that it is best practice to use this for large indices, but what exactly is the harm on having this smaller indices? Can anyone explain?

Thx!

somesoni2 · ‎01-09-2017

The lifecycle of the buckets is hot->warm->cold->frozen. The retention policy is enforce based on cold buckets only, so if there is not cold bucket, the retention policy would not be applied.
The incoming data is written into hot bucket. Default number of hot bucket is 3 (maxHotBuckets). It will stays in hot bucket till 1)maxHotSpanSecs is reached (default 90 days), 2) maxDataSize is reached 2) Splunk got restarted. Once rolled, it'll move to warm bucket stage. Bucket here will be read-only and will stay in warm db stage till number of warm buckets exceeds maxWarmDBCount (default to 300) and roll to cold db.

If the incoming volume is too low and maxDataSize is set to auto_high_volume, the data would may always stays in hot bucket or in warm buckets as no way you'll cross the default 300 warm db count with so much less volume (unless you restart your Splunk 300+ times). So there wont be any cold buckets and data retention would not be applied.

View solution in original post

somesoni2 · ‎01-09-2017

The lifecycle of the buckets is hot->warm->cold->frozen. The retention policy is enforce based on cold buckets only, so if there is not cold bucket, the retention policy would not be applied.
The incoming data is written into hot bucket. Default number of hot bucket is 3 (maxHotBuckets). It will stays in hot bucket till 1)maxHotSpanSecs is reached (default 90 days), 2) maxDataSize is reached 2) Splunk got restarted. Once rolled, it'll move to warm bucket stage. Bucket here will be read-only and will stay in warm db stage till number of warm buckets exceeds maxWarmDBCount (default to 300) and roll to cold db.

If the incoming volume is too low and maxDataSize is set to auto_high_volume, the data would may always stays in hot bucket or in warm buckets as no way you'll cross the default 300 warm db count with so much less volume (unless you restart your Splunk 300+ times). So there wont be any cold buckets and data retention would not be applied.

athorat · ‎08-11-2017

@somesoni2 nice explanation

1>whats happens if we have set retention period to 1 years in the case (1)maxHotSpanSecs is reached (default 90 days), 2) maxDataSize is reached 2) Splunk got restarted.).

2>after 90 days the hot bucket rolls to warm and stays there ?
3>what if the indexer is restarted more than 300 times? will it still hold the data for 1 year

somesoni2 · ‎08-11-2017

Regardless of retention period, hot buckets will roll to warm bucket if either maxHotSpanSecs or maxDataSize is reached or Splunk was restarted.
A warm bucket will stay warm (default location $SPLUNK_DB/IndexName/db) till it's number crosses maxWarmDBCount, which then will roll (the oldest bucket) to cold (default location $SPLUNK_DB/IndexName/coldb). Retention period is applied to buckets in cold stage and it'll start rolling oldest cold bucket to frozen (archived or deleted per what you've setup).
If the indexer is restarted more than 300 times means it could have 300 or more warm buckets. How much data it holds depends upon it's retention period. Data in hot, warm and cold are all searchable.

hunters_splunk · ‎01-09-2017

Hi Renems,

Simply put: Use auto_high_volume when daily volume is >10 GB, or provide a specific size.
The max size of a bucket defaults to auto (750 mb).

Hope this helps. Thanks!
Hunter

inventsekar · ‎01-09-2017

not sure, if you have seen this already, but for other users reference -

You should use "auto_high_volume" for high-volume indexes (such as the
main index); otherwise, use "auto". A "high volume index" would typically
be considered one that gets over 10GB of data per day.

from http://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/Indexesconf

maxDataSize = < positive integer >| auto | auto_high_volume

The maximum size in MB for a hot DB to reach before a roll to warm is
triggered.
Specifying "auto" or "auto_high_volume" will cause Splunk to autotune this
parameter (recommended).
You should use "auto_high_volume" for high-volume indexes (such as the
main index); otherwise, use "auto". A "high volume index" would typically
be considered one that gets over 10GB of data per day.
Defaults to "auto", which sets the size to 750MB.
"auto_high_volume" sets the size to 10GB on 64-bit, and 1GB on 32-bit
systems.
Although the maximum value you can set this is 1048576 MB, which
corresponds to 1 TB, a reasonable number ranges anywhere from 100 to
1. Before proceeding with any higher value, please seek approval of Splunk Support.
If you specify an invalid number or string, maxDataSize will be auto
tuned.
NOTE: The maximum size of your warm buckets may slightly exceed
'maxDataSize', due to post-processing and timing issues with the rolling
policy.

vr2312 · ‎04-30-2017

So if i keep it as auto_high_volume, it is going to move to WARM only after 10GB.

So does that mean each bucket in WARM would be of 10 GB , i.e. default MaxWarmDBCount=300.

If this would be the case then the HOT/WARM space used for the index would needs to be increased.

Please let me know if i am understanding this right @inventsekar

ddrillic · ‎01-09-2017

Do you have maxHotSpanSecs set? (otherwise it's a day, by default). When you have auto_high_volume for slowly growing indexes, you can end up with large buckets that span numerous days and since a bucket is an atomic storage block, you can end up having issues with archiving/deleting.

What is the harm of using auto_high_volume on smaller indexes?

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?