Solved: Constant Memory growth with Universal Forwarder UD...

hrawat_splunk · ‎11-09-2022

Constant Memory growth with Universal Forwarder with ever increasing channels.

Once third party receiver is restarted, UF re-sends lot of duplicate data and frees up channels.

hrawat_splunk · ‎11-09-2022

This issue is applicable only on UF if sendCookedData=false.

You may want to check General forwarder memory growth .

Do you see a trend, where channels are gradually increasing and one of the tcpout group is set to sendCookedData=false?

Metrics.log entries pointing to channels growth.
INFO Metrics - group=map, ingest_pipe=0, name=pipelineinputchannel, current_size=12950

INFO Metrics - group=map, ingest_pipe=0, name=pipelineinputchannel, current_size=12955

INFO Metrics - group=map, ingest_pipe=0, name=pipelineinputchannel, current_size=12959

It's a known issue with universal forwarder where some sources (monitor/udp/tcp) may emit EOF marker late. This results in UF not able to free up channels. However the same config ( 3rd party forwarding) on HF is not an issue.
One way to test if you are hitting the issue, restart 3rd party receiver. If UF memory drops immediately, apply following workaround.

Until the issue is fixed by new patch, use following workaround.

Set following config for 3rd party tcpout group only.

forceTimebasedAutoLB=true

This setting will force close connections and allow consolidation of channels. Thus every `autoLBFrequency` interval reclaim memory.

Note: For all 8.2.x and older releases, forceTimebasedAutoLB works only if the total number of distinct 3rd party valid target <ip address:port> combinations are > 1. If there is only one receiver, forceTimebasedAutoLB setting is no-op. Please don't add dummy/no-existent <ip address:port> combination.

If your 3rd party receiver is on same box as UF, then you should be able to make > 1 receivers by adding `127.0.0.1` in `server` list.

[tcpout:thirdpartytcpout]
server=127.0.0.1:<target port>, <ip address of UF host>:<target port>
sendCookedData=false

For 9.x UFs, connectionsPerTarget setting, if set to `auto` or > 1, then forceTimebasedAutoLB=true works for single receiver tcpout groups.

connectionsPerTarget = [<integer>|auto]
* The maximum number of allowed outbound connections for each target IP address
  as resolved by DNS on the machine.
* A value of "auto" or < 1 means splunkd configures a value for connections for each
  target IP address. Depending on the number of IP addresses that DNS resolves,
  splunkd sets 'connectionsPerTarget' as follows:
  * If the number of resolved target IP addresses is greater than or equal to 10,
    'connectionsPerTarget' gets a value of 1.
  * If the number of resolved target IP addresses is greater than 5
    and less than 10, 'connectionsPerTarget' gets a value of 2.
  * If the number of resolved target IP addresses is greater than 3
    or less than equal to 5, 'connectionsPerTarget' gets a value of 3.
  * If the number of resolved target IP addresses is less than or equal to 3,
    'connectionsPerTarget' gets a value of 4.
* Default: auto

View solution in original post

hrawat_splunk · ‎11-09-2022

This issue is applicable only on UF if sendCookedData=false.

You may want to check General forwarder memory growth .

Do you see a trend, where channels are gradually increasing and one of the tcpout group is set to sendCookedData=false?

Metrics.log entries pointing to channels growth.
INFO Metrics - group=map, ingest_pipe=0, name=pipelineinputchannel, current_size=12950

INFO Metrics - group=map, ingest_pipe=0, name=pipelineinputchannel, current_size=12955

INFO Metrics - group=map, ingest_pipe=0, name=pipelineinputchannel, current_size=12959

It's a known issue with universal forwarder where some sources (monitor/udp/tcp) may emit EOF marker late. This results in UF not able to free up channels. However the same config ( 3rd party forwarding) on HF is not an issue.
One way to test if you are hitting the issue, restart 3rd party receiver. If UF memory drops immediately, apply following workaround.

Until the issue is fixed by new patch, use following workaround.

Set following config for 3rd party tcpout group only.

forceTimebasedAutoLB=true

This setting will force close connections and allow consolidation of channels. Thus every `autoLBFrequency` interval reclaim memory.

Note: For all 8.2.x and older releases, forceTimebasedAutoLB works only if the total number of distinct 3rd party valid target <ip address:port> combinations are > 1. If there is only one receiver, forceTimebasedAutoLB setting is no-op. Please don't add dummy/no-existent <ip address:port> combination.

If your 3rd party receiver is on same box as UF, then you should be able to make > 1 receivers by adding `127.0.0.1` in `server` list.

[tcpout:thirdpartytcpout]
server=127.0.0.1:<target port>, <ip address of UF host>:<target port>
sendCookedData=false

For 9.x UFs, connectionsPerTarget setting, if set to `auto` or > 1, then forceTimebasedAutoLB=true works for single receiver tcpout groups.

connectionsPerTarget = [<integer>|auto]
* The maximum number of allowed outbound connections for each target IP address
  as resolved by DNS on the machine.
* A value of "auto" or < 1 means splunkd configures a value for connections for each
  target IP address. Depending on the number of IP addresses that DNS resolves,
  splunkd sets 'connectionsPerTarget' as follows:
  * If the number of resolved target IP addresses is greater than or equal to 10,
    'connectionsPerTarget' gets a value of 1.
  * If the number of resolved target IP addresses is greater than 5
    and less than 10, 'connectionsPerTarget' gets a value of 2.
  * If the number of resolved target IP addresses is greater than 3
    or less than equal to 5, 'connectionsPerTarget' gets a value of 3.
  * If the number of resolved target IP addresses is less than or equal to 3,
    'connectionsPerTarget' gets a value of 4.
* Default: auto

ravis_splunk · ‎11-18-2022

Reference:- If your 3rd party receiver is on same box as UF, then you should be able to make > 1 receivers by adding `127.0.0.1` in `server` list.

Question:- If the 3rd party receiver is on the same box as UF and if the server list already has an entry for 127.0.0.1 as in

[tcpout:todisk]

server=127.0.0.1:10010

Then is the suggestion to add one more entry for ip 127.0.0.1 and a dummy port?

hrawat_splunk · ‎11-21-2022

No dummy/invalid ipaddress/port to use.

If the receiver is on same localhost and one of the following is true.

If `server` already has 127.0.0.1 then as per the answer add UF ip address.
If `server` already has UF ip address then as per the answer add 127.0.0.1 .

[tcpout:thirdpartytcpout]
server=127.0.0.1:<target port>, <ip address of UF host>:<target port>

hrawat_splunk · ‎08-01-2024

Spoiler

Issue is fixed by 8.2.5 and above.

Constant Memory growth with Universal Forwarder UDP / tcp inputs and third party forwarding enabled.

intermediate forwarder

universal forwarder

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024