Splunk 4.1.2 spec for indexes conf reads:
maxHotSpanSecs = * Upper bound of target max timespan of hot/warm buckets in seconds * Defaults to 90 days * NOTE: if you set this too small, you may get an explosion of hot/warm buckets in the filesystem. The system sets a lower bound implicitly for this parameter at 3600, but this is an advanced parameter that should be set with care and understanding of the characteristics of your data
I just need clarification: is this the maximum span between the earliest event in the warm bucket to the latest event in the hot bucket, or the earliest - latest event in the hot bucket? The default (90 days) makes me think it's the span of all hot and warm events, but the parameter name leads me to believe it's only for the hot bucket.
The backup guidelines state "hot bucket - Currently written to; non-incrementally changing; do not back this up." So, what I'm trying to do is tune indexing policy to insure we move from hot to warm at least daily in order to effectively back up. We are also considering using zfs snapshots and just grabbing all hot and warm buckets, so this might be a non issue.
You are correct, this setting applies to hot buckets, which are then rolled into warm buckets. By setting this to 43200, then your buckets should span no more than 24 hours. However, note that this will apply from the moment a hot-bucket is first created, so if you have a hot bucket start at 15:00, then it will include data from 2 different days.
Also, with multiple hot buckets, Splunk will now index data into different buckets depending on the timestamp, so you could end up with a lot of small buckets if you don't have enough data from a particular time period to fill entire buckets.
Actually, a day is 86400 seconds.
Next, if your maxHotSpanSecs is 3600 (one hour) or 86400 (one day), then the "snapping" behavior is turned on, meaning that those buckets will roll at the top of the hour or day respectively.
If you are going to change
maxHotSpanSecs, it may be worth not jumping down to 1 day all at once. It may be beneficial to try, for example, a 1 week window first. Then you can lower it more if you like the results after a few weeks. However, I suspect that you may be surprised at how often events come in for an older date range, in which cause you could end up creating tons of buckets.... of course this is highly specific to your environment, which I why I'm throwing out the take-it-slow approach...
An alternative to this is having your scheduled backup script kick of an explicit bucket roll (e.g.
$SPLUNK_HOME/bin/splunk search '| debug cmd=roll index=index_name') before doing the backup. This is an approach that we used to use, but now we rely on LVM snapshots, which has been working pretty well. Another consideration is that a bucket roll in Splunk 4 means that ALL hot buckets are rolled, so this has the potential to create even more buckets...
If you haven't look at the backup topic on the Community wiki, I reccomend this page:
I tried running your script but get the following errors:
[splunk@nfldevspc01 ~]$ splunk search '| debug cmd=roll index=secure'
Couldn't complete HTTP request: error:14077410:SSL routines:SSL23GETSERVER_HELLO:sslv3 alert handshake failure