Getting Data In

Can I optimise search by increasing hot buckets?

BobM
Builder

Three questions in one.

  1. Are hot buckets faster than warm for search.
  2. If so is it because they are in memory or because the file is already open?
  3. Is it a good idea to have 30+ hot buckets to speed up data access?

For background, we are indexing 100GB/day and searches over a few hours seem slow so looking for ways to optimise.

Tags (3)
1 Solution

sowings
Splunk Employee
Splunk Employee

Hot buckets are not faster, they're merely the ones which are being written to. Increasing the number of them can help search performance, but in a subtle way: see below.

Sometimes, when you're indexing a lot of data from different sources, the subtle time differences between machines means that events arriving at the indexer are slightly offset from one another in time. Splunk likes to keep the timeline relatively smooth within a given bucket, so it might write event #1 to one bucket, but event #2 in another, to align with the time of events already in those buckets.

So now a new event arrives, and it's got a time stamp that belongs in neither bucket #1 nor bucket #2. Splunk creates a new bucket. But if I now have more hot buckets than the maximum allowed, it's time to rotate one to warm. Let's say we selected bucket #2 to go to warm. Now it's closed up, it's files are no longer being written to, and it enters the warm state. But bucket #2 was only 100M when it was rolled. That's pretty small for a bucket, especially when you're indexing 100G / day.

The search performance part of this discussion is here: If you're rolling buckets too fast, and ending up with a lot of small buckets, then search performance will be hampered as to find events, we have to open more and more buckets.

You can see why buckets are being rolled with a search like this one:


index=_internal source=*splunkd.log databasePartitionPolicy moving

You'll get events from Splunk which indicate why the bucket went from hot to warm. If it's for reasons like "exceeded maxHotBuckets", then you might not have enough. The "main" index has defaults set up for indexing a lot of data. It uses ten (10) max hot buckets, and uses the "auto_high_volume" parameter for a size limit (10G on 64-bit systems). If you're indexing at a high volume to an index other than main, it might benefit you to mimic some of the config of the main index.

Finally, have a look here about ways to evaluate search performance, and optimize your searches.

View solution in original post

sowings
Splunk Employee
Splunk Employee

Hot buckets are not faster, they're merely the ones which are being written to. Increasing the number of them can help search performance, but in a subtle way: see below.

Sometimes, when you're indexing a lot of data from different sources, the subtle time differences between machines means that events arriving at the indexer are slightly offset from one another in time. Splunk likes to keep the timeline relatively smooth within a given bucket, so it might write event #1 to one bucket, but event #2 in another, to align with the time of events already in those buckets.

So now a new event arrives, and it's got a time stamp that belongs in neither bucket #1 nor bucket #2. Splunk creates a new bucket. But if I now have more hot buckets than the maximum allowed, it's time to rotate one to warm. Let's say we selected bucket #2 to go to warm. Now it's closed up, it's files are no longer being written to, and it enters the warm state. But bucket #2 was only 100M when it was rolled. That's pretty small for a bucket, especially when you're indexing 100G / day.

The search performance part of this discussion is here: If you're rolling buckets too fast, and ending up with a lot of small buckets, then search performance will be hampered as to find events, we have to open more and more buckets.

You can see why buckets are being rolled with a search like this one:


index=_internal source=*splunkd.log databasePartitionPolicy moving

You'll get events from Splunk which indicate why the bucket went from hot to warm. If it's for reasons like "exceeded maxHotBuckets", then you might not have enough. The "main" index has defaults set up for indexing a lot of data. It uses ten (10) max hot buckets, and uses the "auto_high_volume" parameter for a size limit (10G on 64-bit systems). If you're indexing at a high volume to an index other than main, it might benefit you to mimic some of the config of the main index.

Finally, have a look here about ways to evaluate search performance, and optimize your searches.

bmacias84
Champion

I dont think Hot buckets are faster. Hot and Warm buckets occupy the same disk. I think the only main differance is Hot are open for write operations. I do know that when splunk restarts hot bucket are immediatly rolled to warm. have you though about segmenting your data into different indexes based on event? Also how much search optimization have you done, how many concurrent searches are running, and can you use summary indexing to roll up your event into smaller buckets? Do you extraction use a lot of regex or delims to break data?

Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...