Deployment Architecture

Why is cluster master reporting "Cannot fix search count as the bucket hasn't rolled yet.", preventing me from meeting my Search Factor?

LiquidTension
Path Finder

My cluster master is currently reporting there are 18 fixup tasks pending which is preventing me from meeting my Search Factor

8 with the category of Search Factor:

Cannot fix search count as the bucket hasn't rolled yet.

10 under replication factor:

Cannot replicate as bucket hasn't rolled yet.

Do people typically wait something like this out?

1 Solution

rbal_splunk
Splunk Employee
Splunk Employee

In an indexer clustering environment, when the data is beginning to get ingested on one indexer (as hot Bucket) and indexed locally, it’s also streamed to other indexers depending on the Replication Factor. If due to some reason the stream of the bucket breaks, the cluster master will throw the error like above, which means data is starting to get indexed on originating indexer, but it is not being streamed to other, and as a result, out of compliance from Clusters RF and SF.

In Splunk Web, you will see messages for these buckets when you navigate to Setting>Indexer clustering>Indexes and click on button “Bucket Status” Look at the bucket name and you will see name like:
summary_forwarders~36~A1688691-A0AE-493C-A8C8-5300DEB73388

where
summary_forwarders > name of index
36 > bucket Id
A1688691-A0AE-493C-A8C8-5300DEB73388> GUID of the Peer where bucket originated

To get the Peer name from the GUID please run the search below on the C;luster Master

| rest /services/cluster/master/peers |table label as Peer  id as “last_string_is_guid”

=====output====

2p2262      https://127.0.0.1:24501/services/cluster/master/peers/A1688691-A0AE-493C-A8C8-5300DEB73388   
2p1262      https://127.0.0.1:24501/services/cluster/master/peers/E26613B0-8146-466F-B33B-37370B2C7197  

Normally Such error messages will eventually clear out when the hot bucket rolls to warm, but that may take some time as Splunk buckets can grow up to 8Gb to 12Gb based on the index configuration.

To resolve this issue right away, you could Force a hot-to-warm roll of l buckets of the impacted index:

/splunk _internal call /data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -auth admin:password

or curl:

curl -k -u admin:changeme https://INDEXER:MGMT_PORT/services/data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -X POST

This need to be done of the index where bucket Originated.

View solution in original post

jbarlow_splunk
Splunk Employee
Splunk Employee

In 6.5.3 and higher , buckets can also be rolled in the UI

  1. Settings > the Distributed Environment group, click Indexer clustering. This takes you to the Master dashboard.
  2. select the Indexes tab.
  3. Click the Bucket Status
  4. Click Bucket, Action > Roll

Karthik
Engager

how can you fix this issue when you have thousands of buckets in pending? I have around 4800 buckets and i cannot go and execute roll command on each and every Index right? Can you give me any solution for this please?

Thanks for the answer.

Karthik_0-1595935469059.png

 

@jbarlow_splunk 

Tags (1)
0 Karma

anilyelmar
Explorer

Splunk 6.4+ comes with easy option to roll the buckets remotely. You can ask your cluster master to identify cluster peer(remote indexer) and roll the buckets automatically.

Execute below curl
curl -k -u username:password https://localhost:/services/cluster/master/control/control/roll-hot-buckets -X POST -d "bucket_id=<...>"

Just make sure to put correct port of your cluster master and bucket_id as you see in error messages.

rbal_splunk
Splunk Employee
Splunk Employee

In an indexer clustering environment, when the data is beginning to get ingested on one indexer (as hot Bucket) and indexed locally, it’s also streamed to other indexers depending on the Replication Factor. If due to some reason the stream of the bucket breaks, the cluster master will throw the error like above, which means data is starting to get indexed on originating indexer, but it is not being streamed to other, and as a result, out of compliance from Clusters RF and SF.

In Splunk Web, you will see messages for these buckets when you navigate to Setting>Indexer clustering>Indexes and click on button “Bucket Status” Look at the bucket name and you will see name like:
summary_forwarders~36~A1688691-A0AE-493C-A8C8-5300DEB73388

where
summary_forwarders > name of index
36 > bucket Id
A1688691-A0AE-493C-A8C8-5300DEB73388> GUID of the Peer where bucket originated

To get the Peer name from the GUID please run the search below on the C;luster Master

| rest /services/cluster/master/peers |table label as Peer  id as “last_string_is_guid”

=====output====

2p2262      https://127.0.0.1:24501/services/cluster/master/peers/A1688691-A0AE-493C-A8C8-5300DEB73388   
2p1262      https://127.0.0.1:24501/services/cluster/master/peers/E26613B0-8146-466F-B33B-37370B2C7197  

Normally Such error messages will eventually clear out when the hot bucket rolls to warm, but that may take some time as Splunk buckets can grow up to 8Gb to 12Gb based on the index configuration.

To resolve this issue right away, you could Force a hot-to-warm roll of l buckets of the impacted index:

/splunk _internal call /data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -auth admin:password

or curl:

curl -k -u admin:changeme https://INDEXER:MGMT_PORT/services/data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -X POST

This need to be done of the index where bucket Originated.

kishor_pinjark2
Path Finder

On CM if I see there are more than 10 indexes having this issue, then I have to run below command for all indexes ?

curl -k -u admin:changeme https://HOST:PORT/services/data/indexes/YOUR_INDEX/roll-hot-buckets -X POST

0 Karma

gjanders
SplunkTrust
SplunkTrust

kishor_pinjarkar it looks like your asking the question on multiple old posts.

That will work but you will have to run it per index or 10 times.
There is also a roll bucket option under the bucket status section of the indexer clustering webpage in the GUI

Alternatively you can wait until the bucket rolls to warm at which point the bucket will fix itself

0 Karma

anilyelmar
Explorer

Splunk 6.4+ comes with easy option to roll the buckets remotely. You can ask your cluster master to identify cluster peer(remote indexer) and roll the buckets automatically.

Execute below curl
curl -k -u username:password https://localhost:/services/cluster/master/control/control/roll-hot-buckets -X POST -d "bucket_id=<...>"

Just make sure to put correct port of your cluster master and bucket_id as you see in error messages.

BP9906
Builder

I keep restarting my cluster (2 peers) and I keep ending up in the same boat with the above error.

How did you resolve it?

0 Karma

dxu_splunk
Splunk Employee
Splunk Employee

normally it'll resolve itself after some time. if you don't mind restarts, try:

stop cluster master
restart peers
start cluster master

(and make sure your rf/sf <= 2)

lguinn2
Legend

Use the cluster master to restart the peers. Restart the cluster master, and then do

splunk rolling-restart cluster-peers

See the docs here. It is not good practice to manually restart peers...

LiquidTension
Path Finder

The cluster actually resolved the issue itself once the bucket rolled (within a few hours). In my case I just had to be patient and resist the urge that something had to be done by me. We originally started utilizing clustering in 5.0.3 and the feature is light years ahead now.

BP9906
Builder

So my problem is resolved. The issue was that is restarted both cluster peers and still had the issue. Soon as I restarted the cluster master, the issue was resolved (after several minutes of waiting for the remaining items to complete).

Seems to me that when seeing this issue, restarting the cluster master resolves it.

0 Karma

LiquidTension
Path Finder

The cluster certainly resolved the replication and search count events.

All Data is Searchable

Search Factor is Met
Replication Factor is Met

miyamaet
Explorer

I downvoted this post because this post isn't a reply

0 Karma

fabiocaldas
Contributor

I downvoted this post because it's not a real response

0 Karma

fulldanad
Path Finder

I downvoted this post because not a reply

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...