Getting Data In

What does Splunk do when one index in an indexer has reached maximum capacity?

vincenteous
Communicator

Hi all,

I'm currently having problem with the storage in one of my indexer. Here's the brief summary of my condition:

  • 1 Search Head instance
  • 3 Indexer instances
  • Several Universal Forwarders, configured to send data to all 3 indexers in load-balance mode

Among the indexes that I have in all indexers, one index (let's say "SMS") in Indexer B has already reached the maximum given bucket size. My question is as follows: If the forwarders keep sending the data in load-balance mode to all indexers, will the forwarders skip sending the data to index "SMS" in indexer B as the maximum capacity has been reached?

Thanks in advance.

Best Regards,

Vincent

0 Karma
1 Solution

ddrillic
Ultra Champion

The following page speaks about it Managing Indexers and Clusters of Indexers

It says - To set the maximum index size on a per-index basis, use the maxTotalDataSizeMB attribute. When this limit is reached, buckets begin rolling to frozen.

View solution in original post

woodcock
Esteemed Legend

It migrates the oldest buckets to Frozen to make room for the new events (FIFO). It generates a log when this happens, you will get a log like this in _internal:

07-24-2014 01:30:51.609 +0200 INFO BucketMover - will attempt to freeze: candidate='/opt/splunk/var/lib/splunk/rest/db/db_#######_#######_#' because 

vincenteous
Communicator

Is there any way we can configure the amount data removed to the freeze bucket? In my case, sometimes the data removed is too much and sometimes it is too little (from the oldest event I can see using search).

0 Karma

yannK
Splunk Employee
Splunk Employee

no, the bucket is the smallest unit of storage.

On the long term, you can try to specify smaller hot buckets to avoid having too large ones (up to 10GB buckets by default), try 500MB to start.
but avoid having too small ones, because it has a performance impact (especially on a cluster)

0 Karma

vincenteous
Communicator

Thanks for the explanation, yannK.
One more, is there any recommendation for the ratio between max index size and max size for hot bucket?

0 Karma

yannK
Splunk Employee
Splunk Employee

To avoid warnings, you may want to have maxhotbucketsize < maxtotoldatasizeMB.

But you also want the buckets to be large enough to avoid creating too many. (performance impact)

Try and see, it depends of your ingestion per day, and the range of your data.
the |dbinspect tool is useful to look at your buckets repartition.

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Dbinspect

vincenteous
Communicator

Noted yannK. Thanks for your help

0 Karma

somesoni2
Revered Legend

The data retention is set per index and per indexer basis, so the forwarder will keep sending data to all three indexers, indexer2 will delete old buckets to make room for new incoming data.

vincenteous
Communicator

So this means there's a risk of data loss, is that correct? I'm quite confused as sometimes the forwarders will only send data to one indexer and ignore the rest even though in load-balance mode.

0 Karma

emiller42
Motivator

So the forwarder load balancing is a little interesting. A forwarder will switch targets on a regular interval. (Default 30 seconds. autoLBFrequency, set in outputs.conf) This means that at any given time, a forwarder is only sending to one indexer. It isn't round robin, instead regularly randomizing the indexer list.

However, it only makes the switch when it's considered 'safe' to do so, to avoid half of an event going to Indexer A, and the other half going to indexer B. This means EOF on a file read, and 10 seconds of inactivity on a TCP connection.

So if your forwarders aren't keeping up with file writes, it's possible for them to get 'stuck' on an indexer, and for that 30 second period to extend quite a bit.

To mitigate, you can set forceTimebasedAutoLB = true (again in Outputs.conf) but then you run into potential problems with events getting split. I wouldn't recommend this.

It's also worth noting that the forwarder doesn't know anything about the state of the indexer besides it being a valid target for data. It doesn't know if a particular index is full or not.

vincenteous
Communicator

I see now. This whole time it's been a misunderstanding on my part.
Thanks for the explanation, emiller.

0 Karma

ddrillic
Ultra Champion

The following page speaks about it Managing Indexers and Clusters of Indexers

It says - To set the maximum index size on a per-index basis, use the maxTotalDataSizeMB attribute. When this limit is reached, buckets begin rolling to frozen.

vincenteous
Communicator

Noted. Thank you for the documentation.

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...