Getting Data In

What is the harm of using auto_high_volume on smaller indexes?

renems
Communicator

Hi,

I found myself on a site where EVERY index is configured auto_high_volume. I'm aware that it is best practice to use this for large indices, but what exactly is the harm on having this smaller indices? Can anyone explain?

Thx!

0 Karma
1 Solution

somesoni2
Revered Legend

The lifecycle of the buckets is hot->warm->cold->frozen. The retention policy is enforce based on cold buckets only, so if there is not cold bucket, the retention policy would not be applied.
The incoming data is written into hot bucket. Default number of hot bucket is 3 (maxHotBuckets). It will stays in hot bucket till 1)maxHotSpanSecs is reached (default 90 days), 2) maxDataSize is reached 2) Splunk got restarted. Once rolled, it'll move to warm bucket stage. Bucket here will be read-only and will stay in warm db stage till number of warm buckets exceeds maxWarmDBCount (default to 300) and roll to cold db.

If the incoming volume is too low and maxDataSize is set to auto_high_volume, the data would may always stays in hot bucket or in warm buckets as no way you'll cross the default 300 warm db count with so much less volume (unless you restart your Splunk 300+ times). So there wont be any cold buckets and data retention would not be applied.

View solution in original post

somesoni2
Revered Legend

The lifecycle of the buckets is hot->warm->cold->frozen. The retention policy is enforce based on cold buckets only, so if there is not cold bucket, the retention policy would not be applied.
The incoming data is written into hot bucket. Default number of hot bucket is 3 (maxHotBuckets). It will stays in hot bucket till 1)maxHotSpanSecs is reached (default 90 days), 2) maxDataSize is reached 2) Splunk got restarted. Once rolled, it'll move to warm bucket stage. Bucket here will be read-only and will stay in warm db stage till number of warm buckets exceeds maxWarmDBCount (default to 300) and roll to cold db.

If the incoming volume is too low and maxDataSize is set to auto_high_volume, the data would may always stays in hot bucket or in warm buckets as no way you'll cross the default 300 warm db count with so much less volume (unless you restart your Splunk 300+ times). So there wont be any cold buckets and data retention would not be applied.

athorat
Communicator

@somesoni2 nice explanation

1>whats happens if we have set retention period to 1 years in the case (1)maxHotSpanSecs is reached (default 90 days), 2) maxDataSize is reached 2) Splunk got restarted.).

2>after 90 days the hot bucket rolls to warm and stays there ?
3>what if the indexer is restarted more than 300 times? will it still hold the data for 1 year

0 Karma

somesoni2
Revered Legend

Regardless of retention period, hot buckets will roll to warm bucket if either maxHotSpanSecs or maxDataSize is reached or Splunk was restarted.
A warm bucket will stay warm (default location $SPLUNK_DB/IndexName/db) till it's number crosses maxWarmDBCount, which then will roll (the oldest bucket) to cold (default location $SPLUNK_DB/IndexName/coldb). Retention period is applied to buckets in cold stage and it'll start rolling oldest cold bucket to frozen (archived or deleted per what you've setup).
If the indexer is restarted more than 300 times means it could have 300 or more warm buckets. How much data it holds depends upon it's retention period. Data in hot, warm and cold are all searchable.

0 Karma

hunters_splunk
Splunk Employee
Splunk Employee

Hi Renems,

Simply put: Use auto_high_volume when daily volume is >10 GB, or provide a specific size.
The max size of a bucket defaults to auto (750 mb).

Hope this helps. Thanks!
Hunter

0 Karma

inventsekar
SplunkTrust
SplunkTrust

not sure, if you have seen this already, but for other users reference -

You should use "auto_high_volume" for high-volume indexes (such as the
main index); otherwise, use "auto". A "high volume index" would typically
be considered one that gets over 10GB of data per day.

from http://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/Indexesconf

maxDataSize = < positive integer >| auto | auto_high_volume
  • The maximum size in MB for a hot DB to reach before a roll to warm is
    triggered.

  • Specifying "auto" or "auto_high_volume" will cause Splunk to autotune this
    parameter (recommended).

  • You should use "auto_high_volume" for high-volume indexes (such as the
    main index); otherwise, use "auto". A "high volume index" would typically
    be considered one that gets over 10GB of data per day.

  • Defaults to "auto", which sets the size to 750MB.

  • "auto_high_volume" sets the size to 10GB on 64-bit, and 1GB on 32-bit
    systems.

  • Although the maximum value you can set this is 1048576 MB, which
    corresponds to 1 TB, a reasonable number ranges anywhere from 100 to

    1. Before proceeding with any higher value, please seek approval of Splunk Support.
  • If you specify an invalid number or string, maxDataSize will be auto
    tuned.

  • NOTE: The maximum size of your warm buckets may slightly exceed
    'maxDataSize', due to post-processing and timing issues with the rolling
    policy.

0 Karma

vr2312
Builder

So if i keep it as auto_high_volume, it is going to move to WARM only after 10GB.

So does that mean each bucket in WARM would be of 10 GB , i.e. default MaxWarmDBCount=300.

If this would be the case then the HOT/WARM space used for the index would needs to be increased.

Please let me know if i am understanding this right @inventsekar

0 Karma

ddrillic
Ultra Champion

Do you have maxHotSpanSecs set? (otherwise it's a day, by default). When you have auto_high_volume for slowly growing indexes, you can end up with large buckets that span numerous days and since a bucket is an atomic storage block, you can end up having issues with archiving/deleting.

0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...