I would like to know if there is a way to create new buckets at a defined interval, say a 5 minutes interval?
I have been trying to tar an index at every 5 minutes using a cron job but it seems that the size of the tarball increases over time. For example:
The first 5 minutes: 1MB - in a tarball with a unique time-based ID The second 5 minutes: 2MB (fresh 1MB + previous 1MB) - in another tarball with a unique time-based ID The third 5 minutes: 3MB (fresh 1MB + previous 2MB) - in another tarball with a unique time-based ID The fourth 5 minutes: 4MB (fresh 1MB + previous 3MB) - in another tarball with a unique time-based ID The fifth 5 minutes: 5MB (fresh 1MB + previous 4MB) - in another tarball with a unique time-based ID ... ... ...
Ideally, I would want to see:
The first 5 minutes: 1MB - in another tarball with a unique time-based ID The second 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID The third 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID The fourth 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID The fifth 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID ... ... ...
Is there a documentation I can look at? What is the conf file that I should configure and how should the stanza look like?
Why are you trying to do this? I believe that whatever you are trying to accomplish, that this is a very deeply misguided approach. I assume you're trying to create a backup or replica of a Splunk index in as close to real-time as possible, but doing it this way will massively hurt search performance (unless you're managing to index a couple of GB every 5 minutes on a single node). Please look at forwarding, or distributed search, or filesystem snapshots, or rethink whether this copying or replica or whatever is even needed.
Have you tried simply setting:
[your_index] maxHotSpanSecs = 150 maxHotBuckets = 6 quarantinePastSecs = 600 quarantineFutureSecs = 120
Maybe I misunderstand what
maxHotSpanSecs is supposed to do, but it sounds like this should create a separate hot bucket for a 5 minute window of data. So as long as the maximum number of hot buckets is low, this should trigger a bucket roll every few minutes. This will not be exact, but it could be close enough.
Of course, this assumes that you have a very minimal indexing delay. If your events are delayed by more than a few minutes, this could cause spawn a massive amount of buckets being created and rolled very quickly. Imagine and forwarder is down for a day, when it comes back up (and has at least one event every 5 minutes) your looking at the possible creation of over 500 buckets. So you'll probably want to set your quarantine interval to be very very low. (Update, I've added some possible quarantine settings, these are just a guess)
The other option is manually calling the bucket rotation by running the following search every 5 minutes:
| debug cmd=roll index=index_name
Of course there are downsides to this too.
WARNING: This stuff is all really tricky. You should really test it all out on a test environment before you try it on a production system.
No, bucket sizes are generally determined by the amount of data they contain, not how old they are. The default size for a hot bucket on a 64-bit system is around the 10000MB mark, controlled by the
maxDataSize setting in
indexes.conf but Splunk will also roll a smaller hot bucket if it has been more than 24 hours since that bucket was updated -
maxHotIdleSecs- controls this.
The closest you will get to rolling every 5 minutes would be to figure out how much data is indexed in a 5 minute interval and set
maxDataSize to that value.
All of the settings for indexes.conf is documented here