I have index clustering using site1 and site2. I have 3 indexers in site 1 and 3 indexers in site 2. I had a disk space issue on 2 servers in site 2. After getting the disk space fixed, I see the following error message on of my indexer
Search peer xxxxxx.com has the following message: Too many streaming errors to target=xx.xx.xxx.xxx:9887. Not rolling hot buckets on further errors to this target. (This condition might exist with other targets too. Please check the logs)
I also see on my index cluster master the following
Replication factor not met
Search factor not met
All data is searchable
Indexing Ready YES
Under the bucket status for some indexers, I see the following
streaming failure - src=3D380B6B-F724-46C4-9D13-D93CDE6A6C82 tgt=8E4CA44F-CA6D-4254-BD47-EB70B6C3D4AD failing=tgt
Any suggestion on how to fix this issue. I do not have access to splunk support so cant really open tickets.
It sounds like your cluster is "fixing up" It may take a while to bring all the buckets back into service and restore RF/SF
On your cluster master goto "settings->indexer clustering->indexes->bucket status"
You should have buckets listed in "fixup tasks in progress" and pending.
If so you just need to give the cluster time.
If you put the cluster in maintenance mode, make sure you remembered to take it out!
It sounds like your cluster is "fixing up" It may take a while to bring all the buckets back into service and restore RF/SF
On your cluster master goto "settings->indexer clustering->indexes->bucket status"
You should have buckets listed in "fixup tasks in progress" and pending.
If so you just need to give the cluster time.
If you put the cluster in maintenance mode, make sure you remembered to take it out!
Thanks for your reply. Fixup tasks in progress is 0 as its all caught up. Fixup tasks pending says 212 and its there for a while now. I just see the number growing. so you are saying that these counts will go down as time progresses?
Yes,
If you have buckets that had not finalised when you had the failure you have to wait for the indexer that holds the single copy to 'roll' that bucket. Once it has been rolled it will be replicated to the other peers, and fixedup along the way.
As long as all data is searchable I would wait a bit longer for Splunk to recover automatically before considering manually rolling any buckets.
How long have there been no jobs in progress?
Sometimes the CM can get a bit confused, restarting the CM will force it to remove any stale jobs and reevaluate the status of the cluster.
Data is searchable. The counts are going down but slowly. The disk space issue got fixed yesterday. Do you think recycling Search head master will expedite the sync? or should I wait for any 24 to 48 hours and monitor the progress. My pending tasks went down from 212 to 209.