Deployment Architecture

how to manage the size of my indexes to fit in my volumes.

mataharry
Communicator

On splunk 4.1.5 how to tune my data retention and rotation between the buckets ? I have limited disk space and I wanted to split the indexes and buckets between several storages.

2 Solutions

yannK
Splunk Employee
Splunk Employee

A good answer in the wiki http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention

What's an index First of all, you have to know that every index has his own locations and settings. The maximum size of your data on a volume will be the sum of all the indexes stored on this volume. The default index is called "main", but you may have created others, or specific apps may have their own. Splunk also uses his own indexes (_internal, _audit, history, summary ....)

What's a bucket Second thing, a bucket is a unit of indexed data; it is physically a directory containing events of certain period. You may have several buckets at the same time in each stage. See details here http://www.splunk.com/base/Documentation/latest/Admin/HowSplunkstoresindexes

Example of default path decomposition of the main index : $SPLUNK_HOME/var/lib/splunk/db/defaultdb/colddb/db_1288199229_1280423250_10/rawdata/94376288.gz * defaultdb is the index * db is the location for the buckets in the cold stage (also called the database location) * db__ is a specific bucket directory

Bucket stages A buckets rolls from a stage to another depending of certain conditions : Hot -> Warm -> Cold -> Frozen (-> Thawed)

* From hot to warm if his size reached a limit `maxDataSize` or his lifetime is older than `maxHotSpanSecs`, or using a manual command to roll the buckets.
* From warm to cold once the number of maxWarmDBCount is reached, the older will be rolled.
* From cold to frozen, once the `maxTotalDataSizeMB` is reached (for hot+warm+cold) or once the `frozenTimePeriodInSecs` reached. Those buckets will be deleted, except if you defined a `coldToFrozenScript` to archive them somewhere. 

Bucket's location The different stages of an index may all have a specific location; this is how you can spread your data on different volumes:

* `homePath` location for the Hot and Warm buckets
    *  Hot (intensive read and write, this is where the indexation occurs)
    * Warm (mostly read, and optimization) 
* `coldPath` location for the Cold buckets (moved once, then read, used for searches only)
* `thawedPath` location for Thawed buckets (used only if you want to re-import frozen buckets)
* There is no Frozen location defined in Splunk, because the default action is to delete them. 

SPOILER:

In Splunk 4.2 it will be possible to split a location between several volumes, useful if you can't dynamically extend your disk partitions.

Recommendations A recommended setup will be to define homePath on your local high speed read+write raid1+0 disks, to define coldPath on the slower disks with a good read speed (RAID5) or remote volumes, and if you defined a coldToFrozenScript, to move the frozen buckets on compressed backup tapes. Be aware that the performances of Splunk will depend of the performances of the storage :

* indexation performance : linked to the write speed on the `homePath` location
* search performance : linked to the read speed on the `homePath` and `coldPath` location. 

By example : doing searches on long periods, using old data stored on remote volumes will be slower than doing a specific search on recent events on the local high speed volumes.

Example with the default settings Now let's see the size you should to reserve for the main index. see the default configuration file $SPLUNK_HOME/etc/system/default/indexes.conf

# global parameters
maxDataSize = auto # 750Mb for auto, 10GB for auto_high_volume
maxWarmDBCount = 300
maxHotSpanSecs = 7776000 # after those time, the hot bucket will be rolled to warm
frozenTimePeriodInSecs = 188697600
maxTotalDataSizeMB = 500000
[main]
homePath = $SPLUNK_DB/defaultdb/db
coldPath = $SPLUNK_DB/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
maxMemMB = 20
maxConcurrentOptimizes = 6
maxHotIdleSecs = 86400
maxHotBuckets = 10
maxDataSize = auto_high_volume # 750Mb for auto, 10GB for auto_high_volume

Space taken by the whole index The maximum size of all buckets for an index is maxTotalDataSizeMB.

Space taken by the hot+warm buckets The maximum size for the hot+warm of the main index will be:

* (maxWarmDBCount + maxHotBuckets ) * maxDataSize
* (300 + 10 )*750MB = 227GB for auto
* (300 + 10 )*10GB = 3100GB for auto_high_volume 

Space taken by the cold buckets Therefore the maximum size of the cold buckets will be:

* maxTotalDataSizeMB - "size of the hot+warm buckets"
* 500GB - 227GB = 273GB for auto
* 500GB - 3100GB = - 2600GB for auto_high_volume, you probably will never have any cold buckets !!! 

Space taken by the frozen Buckets ... wait there are none !

In the same time, the buckets with all events older than frozenTimePeriodInSecs ~ 6 years are removed from warm and cold and deleted or archived out of Splunk.

Stop splunk if remaining size is too small /# defined in server.conf minFreeSpace = 2000 # in MB by default

It's a good way to stop indexing if the size on the warm+hot volume is too small. If splunk is installed on windows c: it's a good idea to increase this value to at least RAM*2.

Voila ! You should have enough elements to choose and configure your Splunk bucket policy.

Remember to redefine your own configuration in $SPLUNK_HOME/etc/system/local/ instead of touching the default files. And look into $SPLUNK_HOME/etc/system/README/ for configuration examples and explanations

View solution in original post

Genti
Splunk Employee
Splunk Employee

from indexes.conf.spec:

maxTotalDataSizeMB = <integer>
* The maximum size of an index (in MB). 
* If an index grows larger, the oldest data is frozen.
* Defaults to 500000.

maxDataSize = <integer, "auto", or "auto_high_volume">
* The maximum size in MBs for a hot db to grow before a roll to warm is triggered
* Specifying "auto" or "auto_high_volume" will cause Splunk to autotune this param based on your system (Recommended)
* You should use "auto_high_volume" for high volume indexes (such as the main
  index), otherwise use "auto".  A "high volume index" would typically be
  considered one that gets over 10GB of data per day.
* "auto" sets the size to 750MB, "auto_high_volume" to 10GB
* Although the maximum value you can set this is 1048576 MB, which corresponds to 1 TB, a reasonable number ranges anywhere from 100 - 50000, 
 * any number outside this range should be approved by Splunk support before proceeding
* If you specify an invalid number or string for maxDataSize, maxDataSize will be auto tuned
* NOTE: The precise size of your warm buckets may vary from maxDataSize due to post processessing and timing issues with the rolling policy

Between these two and setting up different paths for the following:

homePath = <path on server>
* The path that contains the hot and warm databases and fields for the index.
* Splunkd keeps a file handle open for warm databases at all times .
* CAUTION: Path MUST be writable.

coldPath = <path on server>
* The path that contains the cold databases for the index.
* Cold databases are opened as needed when searching.
* CAUTION: Path MUST be writable.  

you should have enough info to make sure that you never run out of disk space.
Cheers,

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

As of version 4.2, you have a much better option. You can specify volumes in indexes.conf, and then define paths to reside on volumes. A volume size will never be exceeded, so you don't have to spend a lot of time figuring out the max possible space that the other settings would take, and you don't wind up wasting space because you have to reserve it "just in case" a maximum is reached.

0 Karma

Genti
Splunk Employee
Splunk Employee

from indexes.conf.spec:

maxTotalDataSizeMB = <integer>
* The maximum size of an index (in MB). 
* If an index grows larger, the oldest data is frozen.
* Defaults to 500000.

maxDataSize = <integer, "auto", or "auto_high_volume">
* The maximum size in MBs for a hot db to grow before a roll to warm is triggered
* Specifying "auto" or "auto_high_volume" will cause Splunk to autotune this param based on your system (Recommended)
* You should use "auto_high_volume" for high volume indexes (such as the main
  index), otherwise use "auto".  A "high volume index" would typically be
  considered one that gets over 10GB of data per day.
* "auto" sets the size to 750MB, "auto_high_volume" to 10GB
* Although the maximum value you can set this is 1048576 MB, which corresponds to 1 TB, a reasonable number ranges anywhere from 100 - 50000, 
 * any number outside this range should be approved by Splunk support before proceeding
* If you specify an invalid number or string for maxDataSize, maxDataSize will be auto tuned
* NOTE: The precise size of your warm buckets may vary from maxDataSize due to post processessing and timing issues with the rolling policy

Between these two and setting up different paths for the following:

homePath = <path on server>
* The path that contains the hot and warm databases and fields for the index.
* Splunkd keeps a file handle open for warm databases at all times .
* CAUTION: Path MUST be writable.

coldPath = <path on server>
* The path that contains the cold databases for the index.
* Cold databases are opened as needed when searching.
* CAUTION: Path MUST be writable.  

you should have enough info to make sure that you never run out of disk space.
Cheers,

View solution in original post

yannK
Splunk Employee
Splunk Employee

A good answer in the wiki http://www.splunk.com/wiki/Deploy:BucketRotationAndRetention

What's an index First of all, you have to know that every index has his own locations and settings. The maximum size of your data on a volume will be the sum of all the indexes stored on this volume. The default index is called "main", but you may have created others, or specific apps may have their own. Splunk also uses his own indexes (_internal, _audit, history, summary ....)

What's a bucket Second thing, a bucket is a unit of indexed data; it is physically a directory containing events of certain period. You may have several buckets at the same time in each stage. See details here http://www.splunk.com/base/Documentation/latest/Admin/HowSplunkstoresindexes

Example of default path decomposition of the main index : $SPLUNK_HOME/var/lib/splunk/db/defaultdb/colddb/db_1288199229_1280423250_10/rawdata/94376288.gz * defaultdb is the index * db is the location for the buckets in the cold stage (also called the database location) * db__ is a specific bucket directory

Bucket stages A buckets rolls from a stage to another depending of certain conditions : Hot -> Warm -> Cold -> Frozen (-> Thawed)

* From hot to warm if his size reached a limit `maxDataSize` or his lifetime is older than `maxHotSpanSecs`, or using a manual command to roll the buckets.
* From warm to cold once the number of maxWarmDBCount is reached, the older will be rolled.
* From cold to frozen, once the `maxTotalDataSizeMB` is reached (for hot+warm+cold) or once the `frozenTimePeriodInSecs` reached. Those buckets will be deleted, except if you defined a `coldToFrozenScript` to archive them somewhere. 

Bucket's location The different stages of an index may all have a specific location; this is how you can spread your data on different volumes:

* `homePath` location for the Hot and Warm buckets
    *  Hot (intensive read and write, this is where the indexation occurs)
    * Warm (mostly read, and optimization) 
* `coldPath` location for the Cold buckets (moved once, then read, used for searches only)
* `thawedPath` location for Thawed buckets (used only if you want to re-import frozen buckets)
* There is no Frozen location defined in Splunk, because the default action is to delete them. 

SPOILER:

In Splunk 4.2 it will be possible to split a location between several volumes, useful if you can't dynamically extend your disk partitions.

Recommendations A recommended setup will be to define homePath on your local high speed read+write raid1+0 disks, to define coldPath on the slower disks with a good read speed (RAID5) or remote volumes, and if you defined a coldToFrozenScript, to move the frozen buckets on compressed backup tapes. Be aware that the performances of Splunk will depend of the performances of the storage :

* indexation performance : linked to the write speed on the `homePath` location
* search performance : linked to the read speed on the `homePath` and `coldPath` location. 

By example : doing searches on long periods, using old data stored on remote volumes will be slower than doing a specific search on recent events on the local high speed volumes.

Example with the default settings Now let's see the size you should to reserve for the main index. see the default configuration file $SPLUNK_HOME/etc/system/default/indexes.conf

# global parameters
maxDataSize = auto # 750Mb for auto, 10GB for auto_high_volume
maxWarmDBCount = 300
maxHotSpanSecs = 7776000 # after those time, the hot bucket will be rolled to warm
frozenTimePeriodInSecs = 188697600
maxTotalDataSizeMB = 500000
[main]
homePath = $SPLUNK_DB/defaultdb/db
coldPath = $SPLUNK_DB/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
maxMemMB = 20
maxConcurrentOptimizes = 6
maxHotIdleSecs = 86400
maxHotBuckets = 10
maxDataSize = auto_high_volume # 750Mb for auto, 10GB for auto_high_volume

Space taken by the whole index The maximum size of all buckets for an index is maxTotalDataSizeMB.

Space taken by the hot+warm buckets The maximum size for the hot+warm of the main index will be:

* (maxWarmDBCount + maxHotBuckets ) * maxDataSize
* (300 + 10 )*750MB = 227GB for auto
* (300 + 10 )*10GB = 3100GB for auto_high_volume 

Space taken by the cold buckets Therefore the maximum size of the cold buckets will be:

* maxTotalDataSizeMB - "size of the hot+warm buckets"
* 500GB - 227GB = 273GB for auto
* 500GB - 3100GB = - 2600GB for auto_high_volume, you probably will never have any cold buckets !!! 

Space taken by the frozen Buckets ... wait there are none !

In the same time, the buckets with all events older than frozenTimePeriodInSecs ~ 6 years are removed from warm and cold and deleted or archived out of Splunk.

Stop splunk if remaining size is too small /# defined in server.conf minFreeSpace = 2000 # in MB by default

It's a good way to stop indexing if the size on the warm+hot volume is too small. If splunk is installed on windows c: it's a good idea to increase this value to at least RAM*2.

Voila ! You should have enough elements to choose and configure your Splunk bucket policy.

Remember to redefine your own configuration in $SPLUNK_HOME/etc/system/local/ instead of touching the default files. And look into $SPLUNK_HOME/etc/system/README/ for configuration examples and explanations

View solution in original post

Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!