Deployment Architecture

Create new buckets at a defined interval

Nicholas_Key
Splunk Employee
Splunk Employee

I would like to know if there is a way to create new buckets at a defined interval, say a 5 minutes interval?

I have been trying to tar an index at every 5 minutes using a cron job but it seems that the size of the tarball increases over time. For example:

The first 5 minutes: 1MB - in a tarball with a unique time-based ID
The second 5 minutes: 2MB (fresh 1MB + previous 1MB) - in another tarball with a unique time-based ID
The third 5 minutes: 3MB (fresh 1MB + previous 2MB) - in another tarball with a unique time-based ID
The fourth 5 minutes: 4MB (fresh 1MB + previous 3MB) - in another tarball with a unique time-based ID
The fifth 5 minutes: 5MB (fresh 1MB + previous 4MB) - in another tarball with a unique time-based ID
...
...
...

Ideally, I would want to see:

The first 5 minutes: 1MB - in another tarball with a unique time-based ID
The second 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID
The third 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID
The fourth 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID
The fifth 5 minutes: 1MB (fresh 1MB) - in another tarball with a unique time-based ID
...
...
...

Is there a documentation I can look at? What is the conf file that I should configure and how should the stanza look like?

Nicholas

Tags (2)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Why are you trying to do this? I believe that whatever you are trying to accomplish, that this is a very deeply misguided approach. I assume you're trying to create a backup or replica of a Splunk index in as close to real-time as possible, but doing it this way will massively hurt search performance (unless you're managing to index a couple of GB every 5 minutes on a single node). Please look at forwarding, or distributed search, or filesystem snapshots, or rethink whether this copying or replica or whatever is even needed.

Lowell
Super Champion

Have you tried simply setting:

indexes.conf:

[your_index]
maxHotSpanSecs = 150
maxHotBuckets = 6
quarantinePastSecs = 600
quarantineFutureSecs = 120

Maybe I misunderstand what maxHotSpanSecs is supposed to do, but it sounds like this should create a separate hot bucket for a 5 minute window of data. So as long as the maximum number of hot buckets is low, this should trigger a bucket roll every few minutes. This will not be exact, but it could be close enough.

Of course, this assumes that you have a very minimal indexing delay. If your events are delayed by more than a few minutes, this could cause spawn a massive amount of buckets being created and rolled very quickly. Imagine and forwarder is down for a day, when it comes back up (and has at least one event every 5 minutes) your looking at the possible creation of over 500 buckets. So you'll probably want to set your quarantine interval to be very very low. (Update, I've added some possible quarantine settings, these are just a guess)


The other option is manually calling the bucket rotation by running the following search every 5 minutes:

| debug cmd=roll index=index_name

Of course there are downsides to this too.


WARNING: This stuff is all really tricky. You should really test it all out on a test environment before you try it on a production system.

0 Karma

Mick
Splunk Employee
Splunk Employee

No, bucket sizes are generally determined by the amount of data they contain, not how old they are. The default size for a hot bucket on a 64-bit system is around the 10000MB mark, controlled by the maxDataSize setting in indexes.conf but Splunk will also roll a smaller hot bucket if it has been more than 24 hours since that bucket was updated - maxHotIdleSecs- controls this.

The closest you will get to rolling every 5 minutes would be to figure out how much data is indexed in a 5 minute interval and set maxDataSize to that value.

All of the settings for indexes.conf is documented here

Get Updates on the Splunk Community!

Splunk Classroom Chronicles: Training Tales and Testimonials

Welcome to the "Splunk Classroom Chronicles" series, created to help curious, career-minded learners get ...

Access Tokens Page - New & Improved

Splunk Observability Cloud recently launched an improved design for the access tokens page for better ...

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

🍂 Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...