Hi,
I found myself on a site where EVERY index is configured auto_high_volume. I'm aware that it is best practice to use this for large indices, but what exactly is the harm on having this smaller indices? Can anyone explain?
Thx!
The lifecycle of the buckets is hot->warm->cold->frozen.
The retention policy
is enforce based on cold buckets only, so if there is not cold bucket, the retention policy would not be applied.
The incoming data is written into hot bucket. Default number of hot
bucket is 3 (maxHotBuckets
). It will stays in hot bucket till 1)maxHotSpanSecs
is reached (default 90 days), 2) maxDataSize
is reached 2) Splunk got restarted. Once rolled, it'll move to warm
bucket stage. Bucket here will be read-only and will stay in warm db stage till number of warm buckets exceeds maxWarmDBCount
(default to 300) and roll to cold
db.
If the incoming volume is too low and maxDataSize
is set to auto_high_volume
, the data would may always stays in hot bucket or in warm buckets as no way you'll cross the default 300 warm db count with so much less volume (unless you restart your Splunk 300+ times). So there wont be any cold buckets and data retention would not be applied.
The lifecycle of the buckets is hot->warm->cold->frozen.
The retention policy
is enforce based on cold buckets only, so if there is not cold bucket, the retention policy would not be applied.
The incoming data is written into hot bucket. Default number of hot
bucket is 3 (maxHotBuckets
). It will stays in hot bucket till 1)maxHotSpanSecs
is reached (default 90 days), 2) maxDataSize
is reached 2) Splunk got restarted. Once rolled, it'll move to warm
bucket stage. Bucket here will be read-only and will stay in warm db stage till number of warm buckets exceeds maxWarmDBCount
(default to 300) and roll to cold
db.
If the incoming volume is too low and maxDataSize
is set to auto_high_volume
, the data would may always stays in hot bucket or in warm buckets as no way you'll cross the default 300 warm db count with so much less volume (unless you restart your Splunk 300+ times). So there wont be any cold buckets and data retention would not be applied.
@somesoni2 nice explanation
1>whats happens if we have set retention period to 1 years in the case (1)maxHotSpanSecs
is reached (default 90 days), 2) maxDataSize is reached 2) Splunk got restarted.).
2>after 90 days the hot bucket rolls to warm and stays there ?
3>what if the indexer is restarted more than 300 times? will it still hold the data for 1 year
Regardless of retention period, hot
buckets will roll to warm
bucket if either maxHotSpanSecs
or maxDataSize
is reached or Splunk was restarted.
A warm bucket will stay warm
(default location $SPLUNK_DB/IndexName/db) till it's number crosses maxWarmDBCount
, which then will roll (the oldest bucket) to cold
(default location $SPLUNK_DB/IndexName/coldb). Retention period is applied to buckets in cold
stage and it'll start rolling oldest cold bucket to frozen (archived or deleted per what you've setup).
If the indexer is restarted more than 300 times means it could have 300 or more warm buckets. How much data it holds depends upon it's retention period. Data in hot
, warm
and cold
are all searchable.
Hi Renems,
Simply put: Use auto_high_volume when daily volume is >10 GB, or provide a specific size.
The max size of a bucket defaults to auto (750 mb).
Hope this helps. Thanks!
Hunter
not sure, if you have seen this already, but for other users reference -
You should use "auto_high_volume" for high-volume indexes (such as the
main index); otherwise, use "auto". A "high volume index" would typically
be considered one that gets over 10GB of data per day.
from http://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/Indexesconf
maxDataSize = < positive integer >| auto | auto_high_volume
The maximum size in MB for a hot DB to reach before a roll to warm is
triggered.
Specifying "auto" or "auto_high_volume" will cause Splunk to autotune this
parameter (recommended).
You should use "auto_high_volume" for high-volume indexes (such as the
main index); otherwise, use "auto". A "high volume index" would typically
be considered one that gets over 10GB of data per day.
Defaults to "auto", which sets the size to 750MB.
"auto_high_volume" sets the size to 10GB on 64-bit, and 1GB on 32-bit
systems.
Although the maximum value you can set this is 1048576 MB, which
corresponds to 1 TB, a reasonable number ranges anywhere from 100 to
If you specify an invalid number or string, maxDataSize will be auto
tuned.
NOTE: The maximum size of your warm buckets may slightly exceed
'maxDataSize', due to post-processing and timing issues with the rolling
policy.
So if i keep it as auto_high_volume, it is going to move to WARM only after 10GB.
So does that mean each bucket in WARM would be of 10 GB , i.e. default MaxWarmDBCount=300.
If this would be the case then the HOT/WARM space used for the index would needs to be increased.
Please let me know if i am understanding this right @inventsekar
Do you have maxHotSpanSecs
set? (otherwise it's a day, by default). When you have auto_high_volume
for slowly growing indexes, you can end up with large buckets that span numerous days and since a bucket is an atomic storage block, you can end up having issues with archiving/deleting.