Solved: Wrongly merged Events/permanently blocked tcpout q...

hrawat_splunk · ‎07-13-2020

Using Universal forwarder as intermediate forwarder for source universal forwarders can cause

Events being merged into one event randomly.
Permanently blocked tcpout queue on Intermediate Universal Forwarder(IUF).

Randomly merged events. There are few scenarios where this can happen. Assuming there are > 1 IUFs and > 1 indexers.

Let's say source UF partially read an event and get's restarted. This partial event is sent to IUF1, IUF1 immediately sends it to Indexer1. Incomplete event now sits in to parsing queue of Indexer1.

After restart, source UF sends rest of the partially read event and few other complete events to IUF2 and indexer2. After that source UF switches to IUF1 and starts sending some events. The moment IUF1 selects indexer1, indexer1 will merge previously saved partial event to random new event of same source file of source UF.

There are some other scenario where partial event waiting on indexer's parsing queue will be merged with some random event of same file.

Another problem with having Intermediate Universal forwarder load balancing Universal Forwarders, is permanently blocked tcpout queue.

Following slide explains how there can be a permanently blocked tcpout queue on Intermediate Universal Forwarder

https://conf.splunk.com/files/2019/slides/FN1570.pdf

hrawat_splunk · ‎07-13-2020

Universal forwarder should never be used as Intermediate forwarder. Instead use heavy forwarder as intermediate forwarder(IHF).

The reason, if it was IHF1 instead of IUF1, the partial event would have been in the parsing queue of IHF1. The moment source UF connection went away, partial event in parsing queue of IHF1 will be discarded.

Partial events will never leave IHF and never wrongly merged at IHF layer.

Also partial events will never stuck into tcpout queue of IHF.

So it's highly recommended to use Heavy forwarder as intermediate forwarder.

View solution in original post

hrawat_splunk · ‎07-13-2020

Universal forwarder should never be used as Intermediate forwarder. Instead use heavy forwarder as intermediate forwarder(IHF).

The reason, if it was IHF1 instead of IUF1, the partial event would have been in the parsing queue of IHF1. The moment source UF connection went away, partial event in parsing queue of IHF1 will be discarded.

Partial events will never leave IHF and never wrongly merged at IHF layer.

Also partial events will never stuck into tcpout queue of IHF.

So it's highly recommended to use Heavy forwarder as intermediate forwarder.

DavidHourani · ‎07-13-2020

This is not very accurate.

UFs are the right way to go for IF in case data routing is the only requirement. HF add a lot of network overhead, has a lot lower throughput and ends up becoming a bottleneck in a lot of cases.

And for the issue you mentioned you can add line-breakers in props.conf for the UF to avoid broken events. So it's not really a UF issue, it's more of a configuration problem on your end.

Cheers,

David

hrawat_splunk · ‎07-13-2020

@DavidHourani My job is to list out the issues that exist using UF as IUF.

You said " line-breakers in props.conf for the UF". Well UF does not read props.conf for line breaker. In case you meant event breaker on source UF, that also does not guarantee all events leaving source UF are on the boundary. Infact there are far too many people believe that with EVENT_BREAKER all events are on the boundary. The underlying problem with IUF is that it simply passes partial events received from source UF to next layer and it's too late to fix that partial event.

hrawat_splunk · ‎07-13-2020

Currently only HF/Indexer/full Splunk instances are designed to handle partial events.

hrawat_splunk · ‎01-12-2021

Found one more serious issue of pipeline channel corruption that caused replication queues blocked(paused) for hours. Taking IUF out ( UF->IDX) resolved the issue.

Wrongly merged Events/permanently blocked tcpout queue with Intermediate Universal Forwarder.

intermediate forwarder

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!