Getting Data In

Why is there Splunk indexers data replication failure?

kamalbeg
Explorer

I have index clustering using site1 and site2. I have 3 indexers in site 1 and 3 indexers in site 2. I had a disk space issue on 2 servers in site 2. After getting the disk space fixed, I see the following error message on of my indexer

Search peer xxxxxx.com has the following message: Too many streaming errors to target=xx.xx.xxx.xxx:9887. Not rolling hot buckets on further errors to this target. (This condition might exist with other targets too. Please check the logs)

I also see on my index cluster master the following
Replication factor not met
Search factor not met
All data is searchable
Indexing Ready YES

Under the bucket status for some indexers, I see the following
streaming failure - src=3D380B6B-F724-46C4-9D13-D93CDE6A6C82 tgt=8E4CA44F-CA6D-4254-BD47-EB70B6C3D4AD failing=tgt

Any suggestion on how to fix this issue. I do not have access to splunk support so cant really open tickets.

1 Solution

nickhills
Ultra Champion

It sounds like your cluster is "fixing up" It may take a while to bring all the buckets back into service and restore RF/SF

On your cluster master goto "settings->indexer clustering->indexes->bucket status"
You should have buckets listed in "fixup tasks in progress" and pending.
If so you just need to give the cluster time.

If you put the cluster in maintenance mode, make sure you remembered to take it out!

If my comment helps, please give it a thumbs up!

View solution in original post

nickhills
Ultra Champion

It sounds like your cluster is "fixing up" It may take a while to bring all the buckets back into service and restore RF/SF

On your cluster master goto "settings->indexer clustering->indexes->bucket status"
You should have buckets listed in "fixup tasks in progress" and pending.
If so you just need to give the cluster time.

If you put the cluster in maintenance mode, make sure you remembered to take it out!

If my comment helps, please give it a thumbs up!

View solution in original post

kamalbeg
Explorer

Thanks for your reply. Fixup tasks in progress is 0 as its all caught up. Fixup tasks pending says 212 and its there for a while now. I just see the number growing. so you are saying that these counts will go down as time progresses?

0 Karma

nickhills
Ultra Champion

Yes,

If you have buckets that had not finalised when you had the failure you have to wait for the indexer that holds the single copy to 'roll' that bucket. Once it has been rolled it will be replicated to the other peers, and fixedup along the way.

As long as all data is searchable I would wait a bit longer for Splunk to recover automatically before considering manually rolling any buckets.

How long have there been no jobs in progress?
Sometimes the CM can get a bit confused, restarting the CM will force it to remove any stale jobs and reevaluate the status of the cluster.

If my comment helps, please give it a thumbs up!

kamalbeg
Explorer

Data is searchable. The counts are going down but slowly. The disk space issue got fixed yesterday. Do you think recycling Search head master will expedite the sync? or should I wait for any 24 to 48 hours and monitor the progress. My pending tasks went down from 212 to 209.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!