Splunk Forwarders and Forced Time Based Load Balancing

iam_dd · ‎08-12-2022

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send data to multiple Splunk receivers (indexers). Distribution of data enables linear scaling and provides better search time performance. This distribution of data is accomplished via the built-in load balancing functionality in the universal forwarder. This article is devoted to the auto load-balancing, its manual override in certain situations, and associated considerations.

By default, the universal forwarder performs auto load-balancing, sending data to a randomized list of receivers from a list of receivers provided in the server setting of the target group stanza. The universal forwarder visits each target once before repeating.

A universal forwarder always performs auto load-balancing, the switching from one receiver to another can be based on the time setting autoLBFrequency, or on the size of data found in the autoLBVolume. If both settings have values, the switching occurs on whatever occurs first. When a universal forwarder switches from one receiver to another, it does so when it’s safe. For example, EOF + 3 sec time_before_close when tailing a file, or when the universal forwarder finds the next event breaker field, or in the case of TCP streams when it hasn’t received anything for 10 sec rawTcpDoneTimeOut. In this way, the universal forwarder tries to ensure that data with a clean boundary goes to a receiver.

In the cases of a very large file, such as a Bluecoat proxy file, a chatty syslog file, or loading a large amount of historical data, a forwarder keeps the connection with one receiver until it has finished sending the entire file. This means that other receivers may be used less optimally if there is no heavy forwarder between the universal forwarder and an indexer. To mitigate this, you can use the forceTimebasedAutoLB setting of the universal forwarder. With this setting, the universal forwarder does not wait for a safe logical point and instead makes a hard switch to a different receiver at a frequency equal to the value of autoLBFrequncy. Here are sample settings in the outputs.conf:

autoLBFrequency = 30
forceTimeBasedAutoLB = true

How does forceTimebasedAutoLB work?

When a forwarder switches from the first receiver to the second it adds a control key at the end of the last chunk of data previously sent to the first receiver and appends it with the next chunk.

The universal forwarder sends these two combined chunks of data to the second receiver. The first receiver parses the data from the end of the chunk it received to the beginning of this chunk looking for the first event boundary using the line breaking setting in props.conf at the receiver.

LINE_BREAKER = <regular expression>

* Specifies a regex that determines how the raw text stream is broken into initial events, before line merging takes place.

Unless the chunk of data ends at the clean boundary, the first receiver drops the data after the first event boundary and pushes the rest of the data up to that clean boundary for indexing. At the second receiver, it first looks for a clean boundary starting at the end of the second data chunk, traverses to the beginning of the data chunk, and then drops the partial event at the end. The universal forwarder then looks for the control key from that point on while still traversing towards the beginning. Once it finds the control key, it traverses from the control key in the previous data chunk looking for an event boundary. Once the universal forwarder finds the clean boundary, it takes the data from that point on and pushes it for indexing. This way the second receiver gets any partial event data that the first receiver dropped. Here is an illustration of this process. Notice how IDX2 is getting the part of EVT4 from data chunk 1 that it sent to IDX1. IDX2 will ultimately index full EVT4 and EVT5 and just like IDX1 it will drop the partial EVT6.

What happens when an event is very large and ForceTimeBasedAutoLB is enabled?

You may come across a situation where the event breakers are not defined or the event is just too large to fit into one chunk (default 64KB). In such cases the receiver doesn’t find an event boundary and the indexer receives a partial event. Below is an illustration of this type of situation. In this case EVT3 is partially indexed (beginning part lost). Large event.png

Is there a way to completely avoid any partial event indexing?

Yes, you enable EVENT BREAKER at the universal forwarder level, in props.conf. When you enable the event breaker, you do not need to configure forcedTimeBasedAutoLB. An event breaker defined with a regex allows the forwarder to create data chunks with clean boundaries so that autoLB kicks in and switches the connection at the end of each event and a receiver always receives the data with clean boundaries.

So what happens if, despite the event breaking rule, the event is still large? In the above example, EVT3 is fully delivered to one receiver before the connection switches. So, there is no event drop in that case. Therefore the event breaking leads to additional processing overhead. Here’s a sample setting for this purpose:

[sourcetype]

EVENT_BREAKER_ENABLE = true

EVENT_BREAKER = <regex>

Can there be data loss while using forceTimebasedAutoLB?

Yes, there can be data loss while using the forceTimeBasedAutoLB option. Imagine a situation when the receiver is overwhelmed either due to search load, or replication, or just not able to keep up with the incoming data. In that case, the tcpinput queue will be blocked at the indexer/receiver. Since the receiver can no longer take more data, the forwarder will wait until the connection times out ( connectionTTL ) at which time the entire data sitting in the forwarder output queue will be dropped. In such cases, the receiver will also drop the data sitting in its tcpinput queue. It is difficult to say how much data will be dropped in such situations since it’s managed at the TCP layer (sliding window/queueing). But there’s a warning message written to the forwarder log.

04-11-2022 07:51:01.500 +0000 INFO TcpOutputProc - Connected to idx=12.34.567.89:9997, pset=0, reuse=0.

04-11-2022 07:51:13.138 +0000 WARN TcpOutputProc - The TCP output processor has paused the data flow. Forwarding to host_dest=inputs1.xxxxxxx inside output group abc-group from host_src=idx-i-xxxxxxxxxxx has been blocked for blocked_seconds=100. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

04-11-2022 07:51:36.560 +0000 INFO TcpOutputProc - Closing stream for idx=12.34.567.89:9997

The Forwarder will quarantine such an indexer as unhealthy. Based on the heartbeatFrequency = <seconds (integer)> setting in outputs.conf, it will recheck its health status. If the status is healthy it will be removed from the quarantine list and available for reuse.

Can this data loss be avoided with the use of useAck?

Such data losses can be avoided with the use of acknowledgements. The useAck, set to false by default, can be used to protect against the loss of in-flight data. You can set useAck globally [tcpout], or by target group [tcpout:<target_group>] stanzas. Note that even useAck does not provide 100% data guarantee due to reasons beyond the control of universal forwarder software (more on this later). When useAck is enabled, the indexer receives the block of data (~64KB), parses it, writes the data to the file system as events, then sends the acknowledgement to the universal forwarder. The forwarder keeps a copy of this block of data in memory in its wait queue until it receives the acknowledgement. The wait queue size is always 3 times the size of the output queue (that a user can specify - maxQueueSize). The maxQueueSize by default is set to auto, which means when useAck is enabled, the output queue size is 7MB and wait queue size is 21MB. Without useAck output queue size in case of maxQueueSize=auto is only 500KB.

By default, if the universal forwarder does not get the acknowledgement from the indexer (due to network issue, or indexer going down, or indexer failing to write to the file system) within 300 seconds, it closes the connection to the indexer. You can use the readTimeout setting to override the 300 limit. The universal forwarder then creates a connection to another indexer (in case of autoLB) or creates another connection to the same indexer, and resends the data. There’s also an additional check by the forwarder based on the number of events. If the forwarder has not received any acknowledgement from a receiver after 3,000 events, it will stop sending the data to that receiver.

Once the wait queue is full, it will stop sending data until at least one block gets an acknowledgement back. If, due to network issues, an acknowledgement from indexer is lost, the forwarder will resend it leading to duplication and writes a WARN log entry. With useAck enabled, it is important to keep the heartbeatFrequency to less than writeTimeout to ensure that the connection is healthy.

Why can’t you guarantee 100% data delivery despite using useAck?

Even with acknowledgements (useAck), it is not possible to cover all the different things that could go wrong. For example, if a file is deleted before it is read by the universal forwarder, it is lost. The ideal situation is that ingestion queues are clear with no blockages, events take three seconds or less to be read by the universal forwarder, and are then transmitted to Splunk parsed and indexed. In this situation, UseAck provides excellent guarantees against data loss.

There are also issues that can affect the data assurance. For example, if, after writing the block to the network (TCP level) the forwarder crashes, wiping out the wait queue, and at the same time indexer went down or failed to write to the file system, this block of data is lost. What could be the maximum data loss? Since the forwarder doesn’t wait for the acknowledgement before sending the next block to the indexer, the maximum data loss possibility is the size of the wait queue i.e. 21MB (for default configuration). The wait queue can fill up even when the indexer is working fine, either due to waiting on writes to the disk or being busy serving search requests. In order to optimize the I/O, an indexer writes to the file system when its write queue is full or when it times out (doesn’t get data) for a few seconds.

We hope you have found this information helpful!

— DD Sharma, Engineering Leader at Splunk

Splunk Forwarders and Forced Time Based Load Balancing

How does forceTimebasedAutoLB work?

Can this data loss be avoided with the use of useAck?

Why can’t you guarantee 100% data delivery despite using useAck?

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

Improve Data Pipelines Using Splunk Data Management

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?