Getting Data In

Why is my Universal Forwarder showing extreme lag or latency when sending Windows event log data?

Explorer

A Windows 2008R2 Universal Forwarder and Indexer are located in different geographical location. Events are hours behind.

There's no limit on outgoing forwarder throughput.

Clearing the Windows Security log allowed the events to catch-up for a short while, but they quickly fell behind again.

1 Solution

Explorer

Issue caused by small windows events which are smaller than MSS. Nagle's algorithm on UF and tcp delayed ack on Indexer will cause significantly reduced thruput. It get's worst on WAN compared to LAN. Default 8k DefaultSendWindow size becoming bottleneck.

Setting DefaultSendWindow windows registry to reasonably higher value will bring events getting indexed real time.
How to calculate appropriate value for DefaultSendWindow ?

Send buffer size = Desired Throughput * latency

More info
https://www.switch.ch/network/tools/tcp_throughput/?do+new+calculation=do+new+calculation (use bottom calculator)
http://www.speedguide.net/faq/what-is-the-bandwidth-delay-product-185
http://www.kehlet.cx/articles/99.html
http://web.archive.org/web/20080803082218/http://dast.nlanr.net/Guides/GettingStarted/TCP_window_siz...
https://www.switch.ch/network/tools/tcp_throughput/

How to set windows registry ?
HKEYLOCALMACHINE
\SYSTEM
\CurrentControlSet
\Services
\Afd
\Parameters
DefaultReceiveWindow
Value Type: REG_DWORD*

https://technet.microsoft.com/en-us/library/cc781532%28v=ws.10%29.aspx

Need to restart windows box after setting the value.

View solution in original post

Splunk Employee
Splunk Employee

tcpSendBufSz config option is available in outputs.conf which is preferred way to fix this issue instead of setting DefaultSendWindow registry.

Splunk Employee
Splunk Employee

One of the easiest ways to identify that you have this problem is by looking at the max_age of those Windows events in Splunk's metrics.log.

index=_internal host=<forwarder> source=*metrics.log* group=per_sourcetype_thruput |timechart avg(max_age) by series useother=f

If you find all wineventlog:* have large avg(max_age) on your forwarder, you have adjusted the following on forwarder:

-evtresolvead_obj is set to 0 in inputs.conf
-maxKbps is set to 0 in limits.conf

and there is still a lag. Consider update tcpSendBufSz suggested below.

0 Karma

Path Finder

It should also be noted that if indexer acknowledgement is enabled on the forwarder, you may also observe a historically large number of these events as a result of the high latency: "TcpOutProc - Read operation timed out expecting ACK from xxx.xxx.xxx.xxx:xxxx in 300 seconds" and "TcpOutProc - Possible duplication of events with channel=...", and therefore further complicating the Universal Forwarder falling behind on events.

0 Karma

Explorer

Issue caused by small windows events which are smaller than MSS. Nagle's algorithm on UF and tcp delayed ack on Indexer will cause significantly reduced thruput. It get's worst on WAN compared to LAN. Default 8k DefaultSendWindow size becoming bottleneck.

Setting DefaultSendWindow windows registry to reasonably higher value will bring events getting indexed real time.
How to calculate appropriate value for DefaultSendWindow ?

Send buffer size = Desired Throughput * latency

More info
https://www.switch.ch/network/tools/tcp_throughput/?do+new+calculation=do+new+calculation (use bottom calculator)
http://www.speedguide.net/faq/what-is-the-bandwidth-delay-product-185
http://www.kehlet.cx/articles/99.html
http://web.archive.org/web/20080803082218/http://dast.nlanr.net/Guides/GettingStarted/TCP_window_siz...
https://www.switch.ch/network/tools/tcp_throughput/

How to set windows registry ?
HKEYLOCALMACHINE
\SYSTEM
\CurrentControlSet
\Services
\Afd
\Parameters
DefaultReceiveWindow
Value Type: REG_DWORD*

https://technet.microsoft.com/en-us/library/cc781532%28v=ws.10%29.aspx

Need to restart windows box after setting the value.

View solution in original post

Communicator

I downvoted this post because use tcpsendbufsz so you don't have to edit the registry. tcpsendbufsz was introduced after the initial answer.

0 Karma

Communicator

Instead of changing the registry, could we just add this setting to the UF config on the servers?

tcpSendBufSz = 16384

  • TCP send buffer size in bytes.
  • Useful to improve thruput with small size events like windows events.
  • Only set this value if you are a TCP/IP expert.
  • Defaults to system default.

Splunk Employee
Splunk Employee

tcpSendBufSz was introduced after this post. So yes setting tcpSendBufSz is same as changing registry value.
This config sets SO_SNDBUF for setsockopt on the forwarder side only.

0 Karma