Re: HF doesn't accept traffic from UF - Page 2

slipinski · ‎11-15-2021

Hi Splunk chaps,

I'm facing problem with feeding HF from UF (HF is sending data to the cloud and this works fine). I can exclude network or firewall issue - both servers are reachable from opposite side.

Below is a chunk of log errors from UF :

11-15-2021 11:12:57.024 +0000 INFO DC:DeploymentClient [6735 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
11-15-2021 11:13:09.024 +0000 INFO DC:DeploymentClient [6735 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
11-15-2021 11:13:10.140 +0000 WARN HttpPubSubConnection [6734 HttpClientPollingThread_97C72192-9F2D-4883-830A-776376593AC1] - Unable to parse message from PubSubSvr:
11-15-2021 11:13:10.140 +0000 INFO HttpPubSubConnection [6734 HttpClientPollingThread_97C72192-9F2D-4883-830A-776376593AC1] - Could not obtain connection, will retry after=70.985 seconds.
11-15-2021 11:13:17.695 +0000 WARN TcpOutputProc [3551 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=172.23.11.216 inside output group default-autolb-group from host_src=ldcrapnvvip10 has been blocked for blocked_seconds=446600. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

Please see output debug from UF.

/opt/splunkforwarder/etc/system/default/outputs.conf [syslog]
/opt/splunkforwarder/etc/system/default/outputs.conf maxEventSize = 1024
/opt/splunkforwarder/etc/system/default/outputs.conf priority = <13>
/opt/splunkforwarder/etc/system/default/outputs.conf type = udp
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf [tcpout]
/opt/splunkforwarder/etc/system/default/outputs.conf ackTimeoutOnShutdown = 30
/opt/splunkforwarder/etc/system/default/outputs.conf autoLBFrequency = 30
/opt/splunkforwarder/etc/system/default/outputs.conf autoLBVolume = 0
/opt/splunkforwarder/etc/system/default/outputs.conf blockOnCloning = true
/opt/splunkforwarder/etc/system/default/outputs.conf blockWarnThreshold = 100
/opt/splunkforwarder/etc/system/default/outputs.conf cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
/opt/splunkforwarder/etc/system/default/outputs.conf compressed = false
/opt/splunkforwarder/etc/system/default/outputs.conf connectionTTL = 0
/opt/splunkforwarder/etc/system/default/outputs.conf connectionTimeout = 20
/opt/splunkforwarder/etc/system/local/outputs.conf defaultGroup = default-autolb-group
/opt/splunkforwarder/etc/system/default/outputs.conf disabled = false
/opt/splunkforwarder/etc/system/default/outputs.conf dropClonedEventsOnQueueFull = 5
/opt/splunkforwarder/etc/system/default/outputs.conf dropEventsOnQueueFull = -1
/opt/splunkforwarder/etc/system/default/outputs.conf ecdhCurves = prime256v1, secp384r1, secp521r1
/opt/splunkforwarder/etc/system/default/outputs.conf forceTimebasedAutoLB = false
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.0.whitelist = .*
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.1.blacklist = _.*
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry)
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.filter.disable = false
/opt/splunkforwarder/etc/system/default/outputs.conf heartbeatFrequency = 30
/opt/splunkforwarder/etc/system/default/outputs.conf indexAndForward = false
/opt/splunkforwarder/etc/system/default/outputs.conf maxConnectionsPerIndexer = 2
/opt/splunkforwarder/etc/system/default/outputs.conf maxFailuresPerInterval = 2
/opt/splunkforwarder/etc/system/default/outputs.conf maxQueueSize = auto
/opt/splunkforwarder/etc/system/default/outputs.conf readTimeout = 300
/opt/splunkforwarder/etc/system/default/outputs.conf secsInFailureInterval = 1
/opt/splunkforwarder/etc/system/default/outputs.conf sendCookedData = true
/opt/splunkforwarder/etc/system/default/outputs.conf sslQuietShutdown = false
/opt/splunkforwarder/etc/system/default/outputs.conf sslVersions = tls1.2
/opt/splunkforwarder/etc/system/default/outputs.conf tcpSendBufSz = 0
/opt/splunkforwarder/etc/system/default/outputs.conf useACK = false
/opt/splunkforwarder/etc/system/default/outputs.conf useClientSSLCompression = true
/opt/splunkforwarder/etc/system/default/outputs.conf writeTimeout = 300
/opt/splunkforwarder/etc/system/local/outputs.conf [tcpout-server://172.23.11.216:9997]
/opt/splunkforwarder/etc/system/local/outputs.conf [tcpout:default-autolb-group]
/opt/splunkforwarder/etc/system/local/outputs.conf disabled = false
/opt/splunkforwarder/etc/system/local/outputs.conf server = 172.23.11.216:9997

Any ideas what blocks it?

thanks in advance,

Sz

slipinski · ‎11-16-2021

@PickleRick I double-checked TLS/SSL configuration on both sides. Looks like default setting and it's the same.

I've check memory utilization on HF and it's quite high: 96% memory is consumed. Could it be a culprit?

PickleRick · ‎11-16-2021

OK, because your initial posts weren't clear on this.

Do you have splunktcp:9997 (or splunktcp-ssl) input enabled? I think I only saw a http input. Or maybe you enabled plain tcp:9997 input instead of splunktcp?

slipinski · ‎11-17-2021

inputs.conf file from $SPLUNK_HOME/etc/apps/search/local

[splunktcp://9997]
disabled = false

Forget my ignorance, but should I add this stanza to $SPLUNK_HOME/etc/system/local/inputs.conf file as well?

PickleRick · ‎11-17-2021

From the technical point of view - you don't have to.

https://docs.splunk.com/Documentation/Splunk/8.2.3/Admin/Wheretofindtheconfigurationfiles

It's just that if you don't keep your configs "tidy", they can get confusing quickly with settings being spread all over the place 🙂

Hmm... but if you have splunktcp input and you can see TLS handshake over the wire then UF must be applying some TLS settings and trying to negotiate secure connection.

isoutamo · ‎11-17-2021

No needs for that. It's enough that this stanza is in place somewhere. Of course, the best practices is that you have own apps which contains those configurations and are easily manager, stored (e.g. into git) and deployed to needed HF's and UF's.

Is this only UF-HF pair which is not working or are there several or are another working?

Probably you have followed this https://docs.splunk.com/Documentation/Splunk/8.2.3/Forwarding/Configureanintermediateforwarder when you have configured this? And you have done needed restarts after configuration changes?

What kind of errors you have on your UF's & HF's internal logs? Can those give any hints?

Can you give (again) output of the next commands:

From UF:

splunk btool outputs list

From. HF:

splunk btool inputs list splunktcp

r. Ismo

slipinski · ‎11-17-2021

From UF

[syslog]
maxEventSize = 1024
priority = <13>
type = udp
[tcpout]
ackTimeoutOnShutdown = 30
autoLBFrequency = 30
autoLBVolume = 0
blockOnCloning = true
blockWarnThreshold = 100
cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
compressed = false
connectionTTL = 0
connectionTimeout = 20
defaultGroup = default-autolb-group
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
ecdhCurves = prime256v1, secp384r1, secp521r1
forceTimebasedAutoLB = false
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry)
forwardedindex.filter.disable = false
heartbeatFrequency = 30
indexAndForward = false
maxConnectionsPerIndexer = 2
maxFailuresPerInterval = 2
maxQueueSize = auto
readTimeout = 300
secsInFailureInterval = 1
sendCookedData = true
sslQuietShutdown = false
sslVersions = tls1.2
tcpSendBufSz = 0
useACK = false
useClientSSLCompression = true
writeTimeout = 300
[tcpout-server://172.23.11.216:9997]
[tcpout:default-autolb-group]
disabled = false
server = 172.23.11.216:9997
sslPassword = password

From UF

[splunktcp]
_rcvbuf = 1572864
acceptFrom = *
connection_host = ip
host = $decideOnStartup
index = default
route = has_key:_replicationBucketUUID:replicationQueue;has_key:_dstrx:typingQueue;has_key:_linebreaker:indexQueue;absent_key:_linebreaker:parsingQueue
[splunktcp://9997]
_rcvbuf = 1572864
connection_host = 10.24.118.91
disabled = false
host = $decideOnStartup
index = discol

I've only just added connection_host setup , but it didn't make any difference.

Logfile:

11-17-2021 09:31:50.643 +0000 WARN TcpOutputProc [18893 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=172.23.11.216 inside output group default-autolb-group from host_src=ldcrapnvvip10 has been blocked for blocked_seconds=47000. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

11-17-2021 09:32:26.647 +0000 ERROR TcpOutputFd [18894 TcpOutEloop] - Read error. Connection reset by peer
11-17-2021 09:32:26.648 +0000 ERROR TcpOutputFd [18894 TcpOutEloop] - Read error. Connection reset by peer
11-17-2021 09:32:26.648 +0000 WARN AutoLoadBalancedConnectionStrategy [18894 TcpOutEloop] - Applying quarantine to ip=172.23.11.216 port=9997 _numberOfFailures=2

PickleRick · ‎11-17-2021

Do you have any IPS or something like that in between those two components?

Because it looks awfully similar to an over-zealous "protection" solution which resets "bad" TLS connections. Notice that each side blames the other one for the connection closing.

I've seen this myself - an IPS/NGFW/whatever notices some "strange" communication (from his point of view) and sends RST to both sides.

slipinski · ‎11-17-2021

The thing is that the network isn't being managed by me. I was just told to take advantage of HF (UF doesn't have access to the internet) and make everything up and running.

But according to the customer, who is managing the network, there is nothing that could block communication between HF and UF.

PickleRick · ‎11-17-2021

I'm not saying I'm 100% sure of this in this case because I cannot be since I don't know the infrastructure. But I've also seen such behaviour and the customer was also saying he didn't have any firewalls in place. But after all it turned out he did 😄

But just to be on the safe side - run tcpdump on both ends of the connection, restart the UF so it tries to connect to HF and compare pcaps from both ends.

If you have your proper 3-way handshake and then suddenly two RST-s which apparently "noone" sent - you have your answer.

isoutamo · ‎11-17-2021

Can you also add inputs / splunktcp from HF not UF?

slipinski · ‎11-17-2021

My bad. It's typo. The second output is from HF.

Anyway I'm pasting it here again:

[splunktcp]
_rcvbuf = 1572864
acceptFrom = *
connection_host = ip
host = $decideOnStartup
index = default
route = has_key:_replicationBucketUUID:replicationQueue;has_key:_dstrx:typingQueue;has_key:_linebreaker:indexQueue;absent_key:_linebreaker:parsingQueue
[splunktcp://9997]
_rcvbuf = 1572864
connection_host = 10.24.118.91
disabled = false
host = $decideOnStartup
index = discol

gcusello · ‎11-16-2021

Hi @slipinski,

please try telnet on port 9997.

Ciao.

Giuseppe

slipinski · ‎11-16-2021

@gcusello I successfully did.

HF doesn't accept traffic from UF

heavy forwarder

universal forwarder

SOC4Kafka - New Kafka Connector Powered by OpenTelemetry

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

Building Momentum: Splunk Developer Program at .conf25

Are you a member of the Splunk Community?