Solved: Help with universal Forwarder not forwarding logs

mike_k · ‎02-14-2022

I am operating in an environment with a standalone Splunk Enterprise instance running v8.1.3 on RHEL. In my environment I have around 350 Universal Forwarders that have been up and running for some time. I am running SSL on port 9997 between my forwarders and my Indexer. Certs being used are custom.

I recently have had a problem with two Universal Forwarders. They are not forwarding any information into Splunk.

In the Splunk GUI, they are appearing in Forwarder Management (and if I delete their entries, they reappear again), which looks good. I have two deployment apps pushed down to these forwarders as follows:

App1 – indexer_config: Sets outputs.conf to point to indexer and defines clientCert and sslRootCAPath cert.
App2 – Splunk_TA_Windows: This App configures inputs.conf to monitor some basic win event logs (e.g System, Security, Application).

Both of the troublesome forwarders are on machines in a dmz and were installed by the same person.

I have looked through the logs on one of the forwarders (see attached PDF). From the logs, it would appear:

The connection from the Universal Forwarder to the Deployment Server is working well – I can see it phoning home in the logs and I could also see it downloading the two apps mentioned above.
The connection from the Universal Forwarder to the Indexer seems to be having issues – it appears to connect with the indexer but then the indexer forcibly closes the connection for some reason.

I can see error message: “TcpOutputProc - The TCP output processor has paused the data flow. Forwarding to host_dest=<indexer_ip> inside output group default-autolb-group from host_src=<UF_server_hostname> has been blocked” which appears to be relevant."

What I am trying to figure out is whether this is an issue with:

The config on the Universal forwarder (possibly an issue with the SSL connection being rejected by the indexer?)
An Issue back at the Indexer.

mike_k · ‎02-28-2022

Hi @isoutamo / @gcusello ,

I've only just gotten back to looking at this issue. Thanks for your suggestions above.

I have managed to figure out what my issue was. As mentioned previously, these Universal Forwarders had been installed by someone outside our team and therefore had been installed slightly differently to the rest of our Universal Forwarders. On digging through the configurations a bit more closely and comparing them to a known working universal forwarder i found out the following:

When these Universal Forwarders had been installed, an instance of outputs.conf had been created in etc/system/local. This conf file contained the following:

defaultGroup=default-autolb-group
[tcpout:default-autolb-group]
server=<indexer_IP_address>

On all our other Forwarders we simply define the deployment server on the CLI at time of installation, which creates file etc/system/local/deploymentclient.conf.

The etc/system/local/outputs.conf configuration was defining a connection to the indexers as HTTP not HTTPS. Worse the config in this file was taking precedence over all the config that the deployment server was pushing down to the forwarder. My indexer_config deployment app was pushing the correct HTTPS configuration but not all the lines of config were being used.

Looking at my aggregated outputs.conf file (using btool) the overall configuration was using defaultGroup=default-autolb-group and pointing to the wrong tcpout stanza (one defining HTTP rather than HTTPS) in outputs.conf. I renamed the offending etc/system/local/outputs.conf file to outputs.bak and restarted splunkd and it is all working now.

View solution in original post

mike_k · ‎02-28-2022

Hi @isoutamo / @gcusello ,

I've only just gotten back to looking at this issue. Thanks for your suggestions above.

I have managed to figure out what my issue was. As mentioned previously, these Universal Forwarders had been installed by someone outside our team and therefore had been installed slightly differently to the rest of our Universal Forwarders. On digging through the configurations a bit more closely and comparing them to a known working universal forwarder i found out the following:

When these Universal Forwarders had been installed, an instance of outputs.conf had been created in etc/system/local. This conf file contained the following:

defaultGroup=default-autolb-group
[tcpout:default-autolb-group]
server=<indexer_IP_address>

On all our other Forwarders we simply define the deployment server on the CLI at time of installation, which creates file etc/system/local/deploymentclient.conf.

The etc/system/local/outputs.conf configuration was defining a connection to the indexers as HTTP not HTTPS. Worse the config in this file was taking precedence over all the config that the deployment server was pushing down to the forwarder. My indexer_config deployment app was pushing the correct HTTPS configuration but not all the lines of config were being used.

Looking at my aggregated outputs.conf file (using btool) the overall configuration was using defaultGroup=default-autolb-group and pointing to the wrong tcpout stanza (one defining HTTP rather than HTTPS) in outputs.conf. I renamed the offending etc/system/local/outputs.conf file to outputs.bak and restarted splunkd and it is all working now.

gcusello · ‎03-01-2022

Hi @mike_k,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors 😉

mike_k · ‎02-14-2022

Further to the above information:

I can perform the following tests from powershell on the troublesome UF:

Test-Netconnection –ComputerName <indexer_ip> -Port 8089

Test-Netconnection –ComputerName <indexer_ip> -Port 9997

In each case it is telling me that “TcpTestSucceeded: True”, so I am happy that the UF can talk to both the Deployment Server and the Indexer through the firewall ok (though this is probably not testing SSL connection to indexer...just that it can connect to port 9997 on indexer).

Searches of similar “TcpOutputProc” errors have pointed at Indexer not accepting connections. I have also looked at my indexer:

License usage for the past 30 days is well under my license limit (probably averging 50-60% of quota).
Disk Usage on the indexer is at 63%
CPU usage is at 10%
Memory usage is at 30%

Are there any other values on my indexer that I should check?

I was also wondering whether it could be an issue with certs or passwords on the troublesome UFs. If I run a “splunk btool outputs list” command on the troublesome UF, it appears to be using the correct certs (as downloaded in the index_config app)

Excerpt from splunk btool outputs list

[tcpout:splunkssl]

clientCert=$SPLUNK_HOME/etc/appps/indexer_config/certs/production_cert.pem

compressed=true

server=<indexer_ip>:9997

sslPassword= <encrypted password>

sslRootCAPath=$SPLUNK_HOME/etc/apps/indexer_config/certs/production_ca_cert.pem

sslVerifyServerCert=true

gcusello · ‎02-14-2022

Hi @mike_k,

did you tried (only for debugging) the connection without SSL?

If, without SSL, it runs, you can exclude every network problem and concentrate your attention on SSL otherwise the problem is in the connection.

Even if I think that the problem is in the SSL configuration.

Two stupid questions:

are you sure about the password you used in all the systems?
did you verified the connection in the _internal index (index=_internal host=your_host)?

Ciao.

Giuseppe

mike_k · ‎02-15-2022

Hi @gcusello ,

If I look at the config of the system, from what i can see, when they configured it for SSL they simply re-used port 9997 for SSL on the indexer (rather then leaving 9997 for HTTP and using 9998 for SSL). I'm assuming i'd have to configure HTTP on 9998 to test this?

Looking at outputs.conf (using splunk btool server list --debug) on the troublesome Universal Forwarder (UF), I can see that the following lines are all taken from my indexer_config app and so all of these should be common across all UFs:

[tcpout:splunkssl]
clientCert=$SPLUNK_HOME/....
compressed=trueserver=<indexer_ip>:9997
sslPassword=<encrypted_password>
sslRootCAPath=$SPLUNK_HOME/....
sslVerifyServerCert=true

so sslPassword should be ok.

Using "splunk btool server list --debug" command on server.conf is showing that the pass4SymmKey parameter is getting set by server.conf in system\local and so this password may be unique to this UF. From what i've read online this is used for authentication between hosts. By default it isn't used by default for auth between deployment server/client (i don't think?) but is it used by default for authenticating a universal forwarder to an indexer? (a.k.a could this be why the indexer is forcibly closing connections on 9997?).

Are there any other passwords i should check on the Universal Forwarder?

Regarding your question about searching the indexer _internal index for logs from this UF. I did a search over the last 30 days. During this time i can see a brief burst of logs when the UF was first installed on the server and can see another brief burst of logs a week later when the server was restarted (OS rebooted). In the main no other logs are being ingested into Splunk from the UF. Interestingly I did a "splunkd restart" today but have only seen a single log entry ingested today (from splunk.version on the UF).

isoutamo · ‎02-15-2022

UF's pass4SymmKey is used for authentication only if you are using indexer discovery to get list of current indexers from CM. If/when you have defined those indexers on outputs.conf that pass is not used anywhere. If I understood correctly from your previous post you have indexer_config app which set those indexers on outputs.conf.

Was that sslPassword encrypted on your app on DS or was it clear text there and it was encrypted on UF side on installation time? If first one, then you must have a same splunk.secret on UF than what is on all other nodes, especially in that node where that password was encrypted. If it is on clear text on DS then splunk.secret can be anything.

r. Ismo

gcusello · ‎02-15-2022

Hi @mike_k,

about the port you're using, you can choose the one you like: 9997 or 9998, obviously it must be the same both on Indexers and on Clients, you could also have some clients using SSl and some clients not using SSL, but they must have different ports.

In other words, you have to put SSL configurations in 9997 (or 9998, the one you're using) input on Indexers and in the same output on Clients.

About your configurations they seem to be OK, obviously you inserted in the file the clear password and Splunk encrypted it after the first restart, is it correct?

Check the configurations on Indexers: I suppose that you enabled SSL on Indexers.

The documentation about this is at

https://docs.splunk.com/Documentation/Splunk/8.2.4/Security/Aboutsecuringdatafromforwarders

About the question about other password, the only relevant pasword is the one for SSL.

Finally, about the logs on _internal, I suppose that the first logs were received when you installed the Forwarder before the SSL configuration; it's strange that you received logs after the SSL configuration.

For these reasons I hint to disable SSL and check if the connection and input is OK, then you can check SSL.

Ciao.

Giuseppe

isoutamo · ‎02-14-2022

Hi

stupid question, but you have restarted those UFs after app has installed?

r. Ismo

mike_k · ‎02-15-2022

Hi @isoutamo

Looking back, the OS of the server was restarted roughly a week after it had been initially installed.

Just to be on the safe side i logged on and did a "Splunk restart" on the troublesome Universal Forwarder today, however am still getting the same results.

Help with universal Forwarder not forwarding logs

indexer

universal forwarder

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024