Deployment Architecture

How do I configure my indexes so that hot buckets to roll to warm at least daily for effective backups?

jeff
Contributor

Splunk 4.1.2 spec for indexes conf reads:

maxHotSpanSecs = 
   * Upper bound of target max timespan of hot/warm buckets in seconds
   * Defaults to 90 days
   * NOTE: if you set this too small, you may get an explosion of hot/warm
     buckets in the filesystem.  The system sets a lower bound implicitly for
     this parameter at 3600, but this is an advanced parameter that should be set
     with care and understanding of the characteristics of your data

I just need clarification: is this the maximum span between the earliest event in the warm bucket to the latest event in the hot bucket, or the earliest - latest event in the hot bucket? The default (90 days) makes me think it's the span of all hot and warm events, but the parameter name leads me to believe it's only for the hot bucket.

The backup guidelines state "hot bucket - Currently written to; non-incrementally changing; do not back this up." So, what I'm trying to do is tune indexing policy to insure we move from hot to warm at least daily in order to effectively back up. We are also considering using zfs snapshots and just grabbing all hot and warm buckets, so this might be a non issue.

1 Solution

mctester
Communicator

You are correct, this setting applies to hot buckets, which are then rolled into warm buckets. By setting this to 43200, then your buckets should span no more than 24 hours. However, note that this will apply from the moment a hot-bucket is first created, so if you have a hot bucket start at 15:00, then it will include data from 2 different days.

Also, with multiple hot buckets, Splunk will now index data into different buckets depending on the timestamp, so you could end up with a lot of small buckets if you don't have enough data from a particular time period to fill entire buckets.

View solution in original post

Lowell
Super Champion

If you are going to change maxHotSpanSecs, it may be worth not jumping down to 1 day all at once. It may be beneficial to try, for example, a 1 week window first. Then you can lower it more if you like the results after a few weeks. However, I suspect that you may be surprised at how often events come in for an older date range, in which cause you could end up creating tons of buckets.... of course this is highly specific to your environment, which I why I'm throwing out the take-it-slow approach...

An alternative to this is having your scheduled backup script kick of an explicit bucket roll (e.g. $SPLUNK_HOME/bin/splunk search '| debug cmd=roll index=index_name') before doing the backup. This is an approach that we used to use, but now we rely on LVM snapshots, which has been working pretty well. Another consideration is that a bucket roll in Splunk 4 means that ALL hot buckets are rolled, so this has the potential to create even more buckets...

If you haven't look at the backup topic on the Community wiki, I reccomend this page:

jsburt
New Member

I tried running your script but get the following errors:
[splunk@nfldevspc01 ~]$ splunk search '| debug cmd=roll index=secure'
Couldn't complete HTTP request: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
[

0 Karma

mctester
Communicator

You are correct, this setting applies to hot buckets, which are then rolled into warm buckets. By setting this to 43200, then your buckets should span no more than 24 hours. However, note that this will apply from the moment a hot-bucket is first created, so if you have a hot bucket start at 15:00, then it will include data from 2 different days.

Also, with multiple hot buckets, Splunk will now index data into different buckets depending on the timestamp, so you could end up with a lot of small buckets if you don't have enough data from a particular time period to fill entire buckets.

sowings
Splunk Employee
Splunk Employee

Actually, a day is 86400 seconds.

Next, if your maxHotSpanSecs is 3600 (one hour) or 86400 (one day), then the "snapping" behavior is turned on, meaning that those buckets will roll at the top of the hour or day respectively.

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Indexesconf

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...