Getting Data In

Why is my Universal Forwarder showing extreme lag or latency when sending Windows event log data?

hsrawat
Explorer

A Windows 2008R2 Universal Forwarder and Indexer are located in different geographical location. Events are hours behind.

There's no limit on outgoing forwarder throughput.

Clearing the Windows Security log allowed the events to catch-up for a short while, but they quickly fell behind again.

1 Solution

hsrawat
Explorer

Issue caused by small windows events which are smaller than MSS. Nagle's algorithm on UF and tcp delayed ack on Indexer will cause significantly reduced thruput. It get's worst on WAN compared to LAN. Default 8k DefaultSendWindow size becoming bottleneck.

Setting DefaultSendWindow windows registry to reasonably higher value will bring events getting indexed real time.
How to calculate appropriate value for DefaultSendWindow ?

Send buffer size = Desired Throughput * latency

More info
https://www.switch.ch/network/tools/tcp_throughput/?do+new+calculation=do+new+calculation (use bottom calculator)
http://www.speedguide.net/faq/what-is-the-bandwidth-delay-product-185
http://www.kehlet.cx/articles/99.html
http://web.archive.org/web/20080803082218/http://dast.nlanr.net/Guides/GettingStarted/TCP_window_siz...
https://www.switch.ch/network/tools/tcp_throughput/

How to set windows registry ?
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Services
\Afd
\Parameters
DefaultReceiveWindow
Value Type: REG_DWORD*

https://technet.microsoft.com/en-us/library/cc781532%28v=ws.10%29.aspx

Need to restart windows box after setting the value.

View solution in original post

hrawat_splunk
Splunk Employee
Splunk Employee

tcpSendBufSz config option is available in outputs.conf which is preferred way to fix this issue instead of setting DefaultSendWindow registry.

jenipherc
Splunk Employee
Splunk Employee

One of the easiest ways to identify that you have this problem is by looking at the max_age of those Windows events in Splunk's metrics.log.

index=_internal host=<forwarder> source=*metrics.log* group=per_sourcetype_thruput |timechart avg(max_age) by series useother=f

If you find all wineventlog:* have large avg(max_age) on your forwarder, you have adjusted the following on forwarder:

-evt_resolve_ad_obj is set to 0 in inputs.conf
-maxKbps is set to 0 in limits.conf

and there is still a lag. Consider update tcpSendBufSz suggested below.

mkolkebeck
Path Finder

It should also be noted that if indexer acknowledgement is enabled on the forwarder, you may also observe a historically large number of these events as a result of the high latency: "TcpOutProc - Read operation timed out expecting ACK from xxx.xxx.xxx.xxx:xxxx in 300 seconds" and "TcpOutProc - Possible duplication of events with channel=...", and therefore further complicating the Universal Forwarder falling behind on events.

0 Karma

hsrawat
Explorer

Issue caused by small windows events which are smaller than MSS. Nagle's algorithm on UF and tcp delayed ack on Indexer will cause significantly reduced thruput. It get's worst on WAN compared to LAN. Default 8k DefaultSendWindow size becoming bottleneck.

Setting DefaultSendWindow windows registry to reasonably higher value will bring events getting indexed real time.
How to calculate appropriate value for DefaultSendWindow ?

Send buffer size = Desired Throughput * latency

More info
https://www.switch.ch/network/tools/tcp_throughput/?do+new+calculation=do+new+calculation (use bottom calculator)
http://www.speedguide.net/faq/what-is-the-bandwidth-delay-product-185
http://www.kehlet.cx/articles/99.html
http://web.archive.org/web/20080803082218/http://dast.nlanr.net/Guides/GettingStarted/TCP_window_siz...
https://www.switch.ch/network/tools/tcp_throughput/

How to set windows registry ?
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Services
\Afd
\Parameters
DefaultReceiveWindow
Value Type: REG_DWORD*

https://technet.microsoft.com/en-us/library/cc781532%28v=ws.10%29.aspx

Need to restart windows box after setting the value.

dfronck
Communicator

I downvoted this post because use tcpsendbufsz so you don't have to edit the registry. tcpsendbufsz was introduced after the initial answer.

0 Karma

dfronck
Communicator

Instead of changing the registry, could we just add this setting to the UF config on the servers?

tcpSendBufSz = 16384

  • TCP send buffer size in bytes.
  • Useful to improve thruput with small size events like windows events.
  • Only set this value if you are a TCP/IP expert.
  • Defaults to system default.

hrawat_splunk
Splunk Employee
Splunk Employee

tcpSendBufSz was introduced after this post. So yes setting tcpSendBufSz is same as changing registry value.
This config sets SO_SNDBUF for setsockopt on the forwarder side only.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...