Getting Data In

Wrongly merged Events/permanently blocked tcpout queue with Intermediate Universal Forwarder.

hrawat_splunk
Splunk Employee
Splunk Employee

Using Universal forwarder as intermediate forwarder for source universal forwarders can cause

  1. Events being merged into one  event randomly.
  2. Permanently blocked tcpout queue on  Intermediate Universal Forwarder(IUF).

Randomly merged events. There are few scenarios where this can happen. Assuming there are > 1 IUFs and > 1 indexers.

Let's say source UF partially read an event and get's restarted. This partial event is sent to IUF1, IUF1 immediately sends it to Indexer1. Incomplete event now sits in to parsing queue of Indexer1. 

After restart, source UF sends rest of the partially read event and few other complete events to IUF2 and indexer2. After that source UF switches to IUF1 and starts sending some events. The moment  IUF1 selects  indexer1, indexer1 will merge previously saved partial event to random new event of same source file of source UF.

There are some other scenario where partial event waiting on indexer's parsing queue will be merged with some random event of same file.

Another problem with having Intermediate Universal forwarder load balancing Universal Forwarders, is permanently blocked tcpout queue.

Following slide explains how there can be a permanently blocked tcpout queue on  Intermediate Universal Forwarder

https://conf.splunk.com/files/2019/slides/FN1570.pdf

 

 

Labels (1)
1 Solution

hrawat_splunk
Splunk Employee
Splunk Employee

Universal forwarder should never be used as Intermediate forwarder. Instead use heavy forwarder as intermediate forwarder(IHF).

The reason, if it was IHF1 instead of IUF1, the partial event would have been in the parsing queue of IHF1. The moment source UF connection went away, partial event in parsing queue of IHF1 will be discarded.

Partial events will never leave IHF and never wrongly merged at IHF layer.

Also partial events will never stuck into tcpout queue of IHF.

So it's highly recommended to use Heavy forwarder as intermediate forwarder.

 

View solution in original post

Tags (1)

hrawat_splunk
Splunk Employee
Splunk Employee

Universal forwarder should never be used as Intermediate forwarder. Instead use heavy forwarder as intermediate forwarder(IHF).

The reason, if it was IHF1 instead of IUF1, the partial event would have been in the parsing queue of IHF1. The moment source UF connection went away, partial event in parsing queue of IHF1 will be discarded.

Partial events will never leave IHF and never wrongly merged at IHF layer.

Also partial events will never stuck into tcpout queue of IHF.

So it's highly recommended to use Heavy forwarder as intermediate forwarder.

 

Tags (1)

DavidHourani
Super Champion

This is not very accurate.

UFs are the right way to go for IF in case data routing is the only requirement. HF add a lot of network overhead, has a lot lower throughput and ends up becoming a bottleneck in a lot of cases.

And for the issue you mentioned you can add line-breakers in props.conf for the UF to avoid broken events. So it's not really a UF issue, it's more of a configuration problem on your end.

Cheers,

David 

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

@DavidHourani  My job is to list out the issues that exist using UF as IUF. 

You said " line-breakers in props.conf for the UF". Well UF does not read props.conf for line breaker. In case you meant event breaker on source UF, that also does not guarantee  all events leaving source UF are on the boundary.  Infact there are far too many people believe that with EVENT_BREAKER all events are on the boundary.  The underlying problem with IUF is that it simply passes partial events received from source UF to next layer and it's too late to fix that partial event. 

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

Currently only HF/Indexer/full Splunk instances are designed to handle partial events. 

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

Found one more serious issue of pipeline channel corruption that caused replication queues blocked(paused) for hours. Taking IUF out ( UF->IDX) resolved the issue.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...