UF stops forwarding when splunk cloud is down

randy_moore · ‎12-02-2020

If you read the title, you are going "well of course it does", but hear me out. (This will be a long explanation that will hopefully answer the immediate questions)...

Background:
We have some on-prem UFs that forward "everything" to our on-prem enterprise indexers AND specific logs to our splunk cloud instance indexer. In case you are wondering, the cloud instance is where our customer can look at their data without needing access to our internal systems.

Problem:
Splunk did some maintenance on our cloud instance and when they did so, forwarding from the UFs also stopped coming into our on-prem Splunk. I can't figure out why cloud being down would stop the forwarders from sending to enterprise.

Checking the documentation here: https://docs.splunk.com/Documentation/Splunk/8.1.0/Forwarding/Setuploadbalancingd#Configure_universa...

It reads like the UFs should switch to the next indexers when it goes down. But it didn't. Instead we saw this in the internal logs when the cloud instance was taken down for maintenance

11-25-2020 21:59:48.139 -0600 WARN TcpOutputProc - The TCP output processor has paused the data flow. Forwarding to output group splunkcloud has been blocked for 1200 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

Looking at the inputs.conf and outputs.conf, I can see nothing wrong with them to have the data blocked from these UFs

Sanitized inputs.conf, with the log that gets sent to both the on-prem instance (PP_indexers) and cloud instance bolded

[monitor://C:\blahblahblah\q2.log]
_TCP_ROUTING = pp_indexers
index = fsd
sourcetype = q2

[monitor://C:\blahblahblah\wrapper.log]
_TCP_ROUTING = pp_indexers
index = fsd_sandbox
sourcetype = wrapper

[monitor://C:\blahblahblah\metrics.log]
_TCP_ROUTING = pp_indexers,splunkcloud
index = fsd_sandbox
sourcetype = metrics

Sanitized outputs.conf:

defaultGroup = pp_indexers
forceTimebasedAutoLB = true
autoLBFrequency = 15

[tcpout:pp_indexers]
server = indexer1.ip.address.here:9997, indexer2.ip.address.here:9997

[tcpout:splunkcloud]
compressed = false
disabled = false
server = our_domain_name.cloud.splunk.com:9997
sslCommonNameToCheck = our_domain_name.cloud.splunk.com
sslCertPath = $SPLUNK_HOME/etc/apps/sanitized/client.pem
sslPassword = sanitized
sslRootCAPath = $SPLUNK_HOME/etc/apps/sanitized/cacert.pem
sslVerifyServerCert = true
useACK = true

Oh and just in case you need it...
UF versions are 7.1.2 and 7.2.3
enterprise version is 7.3.4, cloud is 7.3.

UF stops forwarding when splunk cloud is down

indexer

universal forwarder

Windows

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Event Series: Splunk Observability Metrics Cost Optimization

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

Join the Conversation

UF stops forwarding when splunk cloud is down

indexer

universal forwarder

Windows

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Event Series: Splunk Observability Metrics Cost Optimization

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition