Getting Data In

HF doesn't accept traffic from UF

slipinski
Path Finder

Hi Splunk chaps, 

I'm facing problem with feeding HF from UF (HF is sending data to the cloud and this works fine).  I can exclude network or firewall issue - both servers are reachable from opposite side. 

Below is a chunk of log errors from UF : 

11-15-2021 11:12:57.024 +0000 INFO DC:DeploymentClient [6735 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
11-15-2021 11:13:09.024 +0000 INFO DC:DeploymentClient [6735 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
11-15-2021 11:13:10.140 +0000 WARN HttpPubSubConnection [6734 HttpClientPollingThread_97C72192-9F2D-4883-830A-776376593AC1] - Unable to parse message from PubSubSvr:
11-15-2021 11:13:10.140 +0000 INFO HttpPubSubConnection [6734 HttpClientPollingThread_97C72192-9F2D-4883-830A-776376593AC1] - Could not obtain connection, will retry after=70.985 seconds.
11-15-2021 11:13:17.695 +0000 WARN TcpOutputProc [3551 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=172.23.11.216 inside output group default-autolb-group from host_src=ldcrapnvvip10 has been blocked for blocked_seconds=446600. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

Please see output debug from UF. 

/opt/splunkforwarder/etc/system/default/outputs.conf [syslog]
/opt/splunkforwarder/etc/system/default/outputs.conf maxEventSize = 1024
/opt/splunkforwarder/etc/system/default/outputs.conf priority = <13>
/opt/splunkforwarder/etc/system/default/outputs.conf type = udp
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf [tcpout]
/opt/splunkforwarder/etc/system/default/outputs.conf ackTimeoutOnShutdown = 30
/opt/splunkforwarder/etc/system/default/outputs.conf autoLBFrequency = 30
/opt/splunkforwarder/etc/system/default/outputs.conf autoLBVolume = 0
/opt/splunkforwarder/etc/system/default/outputs.conf blockOnCloning = true
/opt/splunkforwarder/etc/system/default/outputs.conf blockWarnThreshold = 100
/opt/splunkforwarder/etc/system/default/outputs.conf cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
/opt/splunkforwarder/etc/system/default/outputs.conf compressed = false
/opt/splunkforwarder/etc/system/default/outputs.conf connectionTTL = 0
/opt/splunkforwarder/etc/system/default/outputs.conf connectionTimeout = 20
/opt/splunkforwarder/etc/system/local/outputs.conf defaultGroup = default-autolb-group
/opt/splunkforwarder/etc/system/default/outputs.conf disabled = false
/opt/splunkforwarder/etc/system/default/outputs.conf dropClonedEventsOnQueueFull = 5
/opt/splunkforwarder/etc/system/default/outputs.conf dropEventsOnQueueFull = -1
/opt/splunkforwarder/etc/system/default/outputs.conf ecdhCurves = prime256v1, secp384r1, secp521r1
/opt/splunkforwarder/etc/system/default/outputs.conf forceTimebasedAutoLB = false
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.0.whitelist = .*
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.1.blacklist = _.*
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry)
/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/default/outputs.conf forwardedindex.filter.disable = false
/opt/splunkforwarder/etc/system/default/outputs.conf heartbeatFrequency = 30
/opt/splunkforwarder/etc/system/default/outputs.conf indexAndForward = false
/opt/splunkforwarder/etc/system/default/outputs.conf maxConnectionsPerIndexer = 2
/opt/splunkforwarder/etc/system/default/outputs.conf maxFailuresPerInterval = 2
/opt/splunkforwarder/etc/system/default/outputs.conf maxQueueSize = auto
/opt/splunkforwarder/etc/system/default/outputs.conf readTimeout = 300
/opt/splunkforwarder/etc/system/default/outputs.conf secsInFailureInterval = 1
/opt/splunkforwarder/etc/system/default/outputs.conf sendCookedData = true
/opt/splunkforwarder/etc/system/default/outputs.conf sslQuietShutdown = false
/opt/splunkforwarder/etc/system/default/outputs.conf sslVersions = tls1.2
/opt/splunkforwarder/etc/system/default/outputs.conf tcpSendBufSz = 0
/opt/splunkforwarder/etc/system/default/outputs.conf useACK = false
/opt/splunkforwarder/etc/system/default/outputs.conf useClientSSLCompression = true
/opt/splunkforwarder/etc/system/default/outputs.conf writeTimeout = 300
/opt/splunkforwarder/etc/system/local/outputs.conf [tcpout-server://172.23.11.216:9997]
/opt/splunkforwarder/etc/system/local/outputs.conf [tcpout:default-autolb-group]
/opt/splunkforwarder/etc/system/local/outputs.conf disabled = false
/opt/splunkforwarder/etc/system/local/outputs.conf server = 172.23.11.216:9997

 

Any ideas what blocks it? 

thanks in advance,

Sz

 

Labels (2)
0 Karma

slipinski
Path Finder

@PickleRick I double-checked TLS/SSL configuration on both sides. Looks like default setting and it's the same.

I've check memory utilization on HF and it's quite high:  96% memory is consumed. Could it be a culprit? 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK, because your initial posts weren't clear on this.

Do you have splunktcp:9997 (or splunktcp-ssl)  input enabled? I think I only saw a http input. Or maybe you enabled plain tcp:9997 input instead of splunktcp?

 

0 Karma

slipinski
Path Finder

inputs.conf file from $SPLUNK_HOME/etc/apps/search/local

[splunktcp://9997]
disabled = false

Forget my ignorance, but should I add this stanza to $SPLUNK_HOME/etc/system/local/inputs.conf file as well? 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

From the technical point of view - you don't have to.

https://docs.splunk.com/Documentation/Splunk/8.2.3/Admin/Wheretofindtheconfigurationfiles

It's just that if you don't keep your configs "tidy", they can get confusing quickly with settings being spread all over the place 🙂

Hmm... but if you have splunktcp input and you can see TLS handshake over the wire then UF must be applying some TLS settings and trying to negotiate secure connection.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

No needs for that. It's enough that this stanza is in place somewhere. Of course, the best practices is that you have own apps which contains those configurations and are easily manager, stored (e.g. into git) and deployed to needed HF's and UF's.

Is this only UF-HF pair which is not working or are there several or are another working?

Probably you have followed this https://docs.splunk.com/Documentation/Splunk/8.2.3/Forwarding/Configureanintermediateforwarder when you have configured this? And you have done needed restarts after configuration changes?

What kind of errors you have on your UF's & HF's internal logs? Can those give any hints?

Can you give (again) output of the next commands:

From UF:

splunk btool outputs list

From. HF:

splunk btool inputs list splunktcp

r. Ismo

0 Karma

slipinski
Path Finder

From UF

[syslog]
maxEventSize = 1024
priority = <13>
type = udp
[tcpout]
ackTimeoutOnShutdown = 30
autoLBFrequency = 30
autoLBVolume = 0
blockOnCloning = true
blockWarnThreshold = 100
cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
compressed = false
connectionTTL = 0
connectionTimeout = 20
defaultGroup = default-autolb-group
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
ecdhCurves = prime256v1, secp384r1, secp521r1
forceTimebasedAutoLB = false
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry)
forwardedindex.filter.disable = false
heartbeatFrequency = 30
indexAndForward = false
maxConnectionsPerIndexer = 2
maxFailuresPerInterval = 2
maxQueueSize = auto
readTimeout = 300
secsInFailureInterval = 1
sendCookedData = true
sslQuietShutdown = false
sslVersions = tls1.2
tcpSendBufSz = 0
useACK = false
useClientSSLCompression = true
writeTimeout = 300
[tcpout-server://172.23.11.216:9997]
[tcpout:default-autolb-group]
disabled = false
server = 172.23.11.216:9997
sslPassword = password

From UF

[splunktcp]
_rcvbuf = 1572864
acceptFrom = *
connection_host = ip
host = $decideOnStartup
index = default
route = has_key:_replicationBucketUUID:replicationQueue;has_key:_dstrx:typingQueue;has_key:_linebreaker:indexQueue;absent_key:_linebreaker:parsingQueue
[splunktcp://9997]
_rcvbuf = 1572864
connection_host = 10.24.118.91
disabled = false
host = $decideOnStartup
index = discol

I've only just added connection_host setup , but it didn't make any difference. 

Logfile:

11-17-2021 09:31:50.643 +0000 WARN TcpOutputProc [18893 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=172.23.11.216 inside output group default-autolb-group from host_src=ldcrapnvvip10 has been blocked for blocked_seconds=47000. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

11-17-2021 09:32:26.647 +0000 ERROR TcpOutputFd [18894 TcpOutEloop] - Read error. Connection reset by peer
11-17-2021 09:32:26.648 +0000 ERROR TcpOutputFd [18894 TcpOutEloop] - Read error. Connection reset by peer
11-17-2021 09:32:26.648 +0000 WARN AutoLoadBalancedConnectionStrategy [18894 TcpOutEloop] - Applying quarantine to ip=172.23.11.216 port=9997 _numberOfFailures=2

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Do you have any IPS or something like that in between those two components?

Because it looks awfully similar to an over-zealous "protection" solution which resets "bad" TLS connections. Notice that each side blames the other one for the connection closing.

I've seen this myself - an IPS/NGFW/whatever notices some "strange" communication (from his point of view) and sends RST to both sides.

0 Karma

slipinski
Path Finder

The thing is that the network isn't being managed by me. I was just told to take advantage of HF (UF doesn't have access to the internet) and make everything up and running.

But according to the customer, who is managing the network, there is nothing that could block communication between HF and UF.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I'm not saying I'm 100% sure of this in this case because I cannot be since I don't know the infrastructure. But I've also seen such behaviour and the customer was also saying he didn't have any firewalls in place. But after all it turned out he did 😄

But just to be on the safe side - run tcpdump on both ends of the connection, restart the UF so it tries to connect to HF and compare pcaps from both ends.

If you have your proper 3-way handshake and then suddenly two RST-s which apparently "noone" sent - you have your answer.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Can you also add inputs / splunktcp from HF not UF?
0 Karma

slipinski
Path Finder

My bad. It's typo. The second output is from HF. 

Anyway I'm pasting it here again: 

[splunktcp]
_rcvbuf = 1572864
acceptFrom = *
connection_host = ip
host = $decideOnStartup
index = default
route = has_key:_replicationBucketUUID:replicationQueue;has_key:_dstrx:typingQueue;has_key:_linebreaker:indexQueue;absent_key:_linebreaker:parsingQueue
[splunktcp://9997]
_rcvbuf = 1572864
connection_host = 10.24.118.91
disabled = false
host = $decideOnStartup
index = discol

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @slipinski,

please try telnet on port 9997.

Ciao.

Giuseppe

0 Karma

slipinski
Path Finder

@gcusello  I successfully did. 

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...