hello, I have a strange question, This question is described as a bit rough.
I have a single site cluster that contains 5 indexers, 4 search heads, a deploye, a cluster master, some deployment servers, some heavy forwarders, and some universal forwarders. The deployment server also acts as the role of a heavy forwarder.
The search factor of indexer clustering is 2 and replication factor is 3. Universal forwarder monitor log files then forward to HF, then hf forward it to indexers cluster.
Strange things always happen unreasonably. When the cluster is running for a period of time, some sourcetype event will be duplicated, Sometimes, each event is repeated 5 times. if I restart heavy forwarders. The repetition of the phenomenon will disappear. The whole cluster will return to normal but sometimes I need to restart their universal forwarder for it to work.
Some soucetype events have been duplicate again and I will need to restart HF OR UF to return to normal state.
I tried to find out the reason from the indexer's splunkd.log, but I didn't find any clues.
I think index replication has a problem but I couldn't find any error logs. Why does it return to normal when I restart HF or UF?
In addition to @woodcock's great answer, you should avoid the intermediate HF if you don't need it for a specific purpose. UFs distribute events among indexers better than an HF. Also, the HF can actually make the indexers work harder to process events.
If you eliminate the HF, be sure to set useAck=true
on the UFs.
I've seen something similar before and for us it seemed to be due to a misbehaving indexer. When you search the events and see duplicates, what does the splunk_server field show? The splunk_server field will show you which indexer the search is pulling the event from. In our case each duplicate showed one server that every event had in common, while other indexers where distributed across each duplicate. We identified the problem indexer and took it out of the cluster and it resolved. Since it is single site you shouldn't have an issue with search affinity.
Can you send the output.conf and the input.conf?
For UF->HF set useAck
to false
but for HF/UF->IDX set useAck
to true
. Also be sure to use EVENT_BREAKER
everywhere.
What version of splunk are you running?
What search heads are listed in the Cluster Master? (It should just be the search heads and the cluster master, not any of the other stuff)
What does your outputs.conf look like on the HF?
I am facing a similar issue on the cloud architecture, but on-prem architecture so far did not have the issue mentioned above.
I am skeptical if that has to do something with timezones.