My cluster master is currently reporting there are 18 fixup tasks pending which is preventing me from meeting my Search Factor
8 with the category of Search Factor:
Cannot fix search count as the bucket hasn't rolled yet.
10 under replication factor:
Cannot replicate as bucket hasn't rolled yet.
Do people typically wait something like this out?
In an indexer clustering environment, when the data is beginning to get ingested on one indexer (as hot Bucket) and indexed locally, it’s also streamed to other indexers depending on the Replication Factor. If due to some reason the stream of the bucket breaks, the cluster master will throw the error like above, which means data is starting to get indexed on originating indexer, but it is not being streamed to other, and as a result, out of compliance from Clusters RF and SF.
In Splunk Web, you will see messages for these buckets when you navigate to Setting>Indexer clustering>Indexes and click on button “Bucket Status” Look at the bucket name and you will see name like:
summary_forwarders~36~A1688691-A0AE-493C-A8C8-5300DEB73388
where
summary_forwarders > name of index
36 > bucket Id
A1688691-A0AE-493C-A8C8-5300DEB73388> GUID of the Peer where bucket originated
To get the Peer name from the GUID please run the search below on the C;luster Master
| rest /services/cluster/master/peers |table label as Peer id as “last_string_is_guid”
=====output====
2p2262 https://127.0.0.1:24501/services/cluster/master/peers/A1688691-A0AE-493C-A8C8-5300DEB73388
2p1262 https://127.0.0.1:24501/services/cluster/master/peers/E26613B0-8146-466F-B33B-37370B2C7197
Normally Such error messages will eventually clear out when the hot bucket rolls to warm, but that may take some time as Splunk buckets can grow up to 8Gb to 12Gb based on the index configuration.
To resolve this issue right away, you could Force a hot-to-warm roll of l buckets of the impacted index:
/splunk _internal call /data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -auth admin:password
or curl:
curl -k -u admin:changeme https://INDEXER:MGMT_PORT/services/data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -X POST
This need to be done of the index where bucket Originated.
In 6.5.3 and higher , buckets can also be rolled in the UI
how can you fix this issue when you have thousands of buckets in pending? I have around 4800 buckets and i cannot go and execute roll command on each and every Index right? Can you give me any solution for this please?
Thanks for the answer.
Splunk 6.4+ comes with easy option to roll the buckets remotely. You can ask your cluster master to identify cluster peer(remote indexer) and roll the buckets automatically.
Execute below curl
curl -k -u username:password https://localhost:/services/cluster/master/control/control/roll-hot-buckets -X POST -d "bucket_id=<...>"
Just make sure to put correct port of your cluster master and bucket_id as you see in error messages.
In an indexer clustering environment, when the data is beginning to get ingested on one indexer (as hot Bucket) and indexed locally, it’s also streamed to other indexers depending on the Replication Factor. If due to some reason the stream of the bucket breaks, the cluster master will throw the error like above, which means data is starting to get indexed on originating indexer, but it is not being streamed to other, and as a result, out of compliance from Clusters RF and SF.
In Splunk Web, you will see messages for these buckets when you navigate to Setting>Indexer clustering>Indexes and click on button “Bucket Status” Look at the bucket name and you will see name like:
summary_forwarders~36~A1688691-A0AE-493C-A8C8-5300DEB73388
where
summary_forwarders > name of index
36 > bucket Id
A1688691-A0AE-493C-A8C8-5300DEB73388> GUID of the Peer where bucket originated
To get the Peer name from the GUID please run the search below on the C;luster Master
| rest /services/cluster/master/peers |table label as Peer id as “last_string_is_guid”
=====output====
2p2262 https://127.0.0.1:24501/services/cluster/master/peers/A1688691-A0AE-493C-A8C8-5300DEB73388
2p1262 https://127.0.0.1:24501/services/cluster/master/peers/E26613B0-8146-466F-B33B-37370B2C7197
Normally Such error messages will eventually clear out when the hot bucket rolls to warm, but that may take some time as Splunk buckets can grow up to 8Gb to 12Gb based on the index configuration.
To resolve this issue right away, you could Force a hot-to-warm roll of l buckets of the impacted index:
/splunk _internal call /data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -auth admin:password
or curl:
curl -k -u admin:changeme https://INDEXER:MGMT_PORT/services/data/indexes/YOUR_INDEX_HERE/roll-hot-buckets -X POST
This need to be done of the index where bucket Originated.
On CM if I see there are more than 10 indexes having this issue, then I have to run below command for all indexes ?
curl -k -u admin:changeme https://HOST:PORT/services/data/indexes/YOUR_INDEX/roll-hot-buckets -X POST
kishor_pinjarkar it looks like your asking the question on multiple old posts.
That will work but you will have to run it per index or 10 times.
There is also a roll bucket option under the bucket status section of the indexer clustering webpage in the GUI
Alternatively you can wait until the bucket rolls to warm at which point the bucket will fix itself
Splunk 6.4+ comes with easy option to roll the buckets remotely. You can ask your cluster master to identify cluster peer(remote indexer) and roll the buckets automatically.
Execute below curl
curl -k -u username:password https://localhost:/services/cluster/master/control/control/roll-hot-buckets -X POST -d "bucket_id=<...>"
Just make sure to put correct port of your cluster master and bucket_id as you see in error messages.
I keep restarting my cluster (2 peers) and I keep ending up in the same boat with the above error.
How did you resolve it?
normally it'll resolve itself after some time. if you don't mind restarts, try:
stop cluster master
restart peers
start cluster master
(and make sure your rf/sf <= 2)
Use the cluster master to restart the peers. Restart the cluster master, and then do
splunk rolling-restart cluster-peers
See the docs here. It is not good practice to manually restart peers...
The cluster actually resolved the issue itself once the bucket rolled (within a few hours). In my case I just had to be patient and resist the urge that something had to be done by me. We originally started utilizing clustering in 5.0.3 and the feature is light years ahead now.
So my problem is resolved. The issue was that is restarted both cluster peers and still had the issue. Soon as I restarted the cluster master, the issue was resolved (after several minutes of waiting for the remaining items to complete).
Seems to me that when seeing this issue, restarting the cluster master resolves it.
The cluster certainly resolved the replication and search count events.
All Data is Searchable
Search Factor is Met
Replication Factor is Met
I downvoted this post because this post isn't a reply
I downvoted this post because it's not a real response
I downvoted this post because not a reply