I am operating in an environment with a standalone Splunk Enterprise instance running v8.1.3 on RHEL. In my environment I have around 350 Universal Forwarders that have been up and running for some time. I am running SSL on port 9997 between my forwarders and my Indexer. Certs being used are custom.
I recently have had a problem with two Universal Forwarders. They are not forwarding any information into Splunk.
In the Splunk GUI, they are appearing in Forwarder Management (and if I delete their entries, they reappear again), which looks good. I have two deployment apps pushed down to these forwarders as follows:
Both of the troublesome forwarders are on machines in a dmz and were installed by the same person.
I have looked through the logs on one of the forwarders (see attached PDF). From the logs, it would appear:
I can see error message: “TcpOutputProc - The TCP output processor has paused the data flow. Forwarding to host_dest=<indexer_ip> inside output group default-autolb-group from host_src=<UF_server_hostname> has been blocked” which appears to be relevant."
What I am trying to figure out is whether this is an issue with:
I've only just gotten back to looking at this issue. Thanks for your suggestions above.
I have managed to figure out what my issue was. As mentioned previously, these Universal Forwarders had been installed by someone outside our team and therefore had been installed slightly differently to the rest of our Universal Forwarders. On digging through the configurations a bit more closely and comparing them to a known working universal forwarder i found out the following:
When these Universal Forwarders had been installed, an instance of outputs.conf had been created in etc/system/local. This conf file contained the following:
defaultGroup=default-autolb-group
[tcpout:default-autolb-group]
server=<indexer_IP_address>
On all our other Forwarders we simply define the deployment server on the CLI at time of installation, which creates file etc/system/local/deploymentclient.conf.
The etc/system/local/outputs.conf configuration was defining a connection to the indexers as HTTP not HTTPS. Worse the config in this file was taking precedence over all the config that the deployment server was pushing down to the forwarder. My indexer_config deployment app was pushing the correct HTTPS configuration but not all the lines of config were being used.
Looking at my aggregated outputs.conf file (using btool) the overall configuration was using defaultGroup=default-autolb-group and pointing to the wrong tcpout stanza (one defining HTTP rather than HTTPS) in outputs.conf. I renamed the offending etc/system/local/outputs.conf file to outputs.bak and restarted splunkd and it is all working now.
I've only just gotten back to looking at this issue. Thanks for your suggestions above.
I have managed to figure out what my issue was. As mentioned previously, these Universal Forwarders had been installed by someone outside our team and therefore had been installed slightly differently to the rest of our Universal Forwarders. On digging through the configurations a bit more closely and comparing them to a known working universal forwarder i found out the following:
When these Universal Forwarders had been installed, an instance of outputs.conf had been created in etc/system/local. This conf file contained the following:
defaultGroup=default-autolb-group
[tcpout:default-autolb-group]
server=<indexer_IP_address>
On all our other Forwarders we simply define the deployment server on the CLI at time of installation, which creates file etc/system/local/deploymentclient.conf.
The etc/system/local/outputs.conf configuration was defining a connection to the indexers as HTTP not HTTPS. Worse the config in this file was taking precedence over all the config that the deployment server was pushing down to the forwarder. My indexer_config deployment app was pushing the correct HTTPS configuration but not all the lines of config were being used.
Looking at my aggregated outputs.conf file (using btool) the overall configuration was using defaultGroup=default-autolb-group and pointing to the wrong tcpout stanza (one defining HTTP rather than HTTPS) in outputs.conf. I renamed the offending etc/system/local/outputs.conf file to outputs.bak and restarted splunkd and it is all working now.
Hi @mike_k,
good for you, see next time!
Ciao and happy splunking
Giuseppe
P.S.: Karma Points are appreciated by all the Contributors 😉
Further to the above information:
I can perform the following tests from powershell on the troublesome UF:
Test-Netconnection –ComputerName <indexer_ip> -Port 8089
Test-Netconnection –ComputerName <indexer_ip> -Port 9997
In each case it is telling me that “TcpTestSucceeded: True”, so I am happy that the UF can talk to both the Deployment Server and the Indexer through the firewall ok (though this is probably not testing SSL connection to indexer...just that it can connect to port 9997 on indexer).
Searches of similar “TcpOutputProc” errors have pointed at Indexer not accepting connections. I have also looked at my indexer:
Are there any other values on my indexer that I should check?
I was also wondering whether it could be an issue with certs or passwords on the troublesome UFs. If I run a “splunk btool outputs list” command on the troublesome UF, it appears to be using the correct certs (as downloaded in the index_config app)
Excerpt from splunk btool outputs list
[tcpout:splunkssl]
clientCert=$SPLUNK_HOME/etc/appps/indexer_config/certs/production_cert.pem
compressed=true
server=<indexer_ip>:9997
sslPassword= <encrypted password>
sslRootCAPath=$SPLUNK_HOME/etc/apps/indexer_config/certs/production_ca_cert.pem
sslVerifyServerCert=true
Hi @mike_k,
did you tried (only for debugging) the connection without SSL?
If, without SSL, it runs, you can exclude every network problem and concentrate your attention on SSL otherwise the problem is in the connection.
Even if I think that the problem is in the SSL configuration.
Two stupid questions:
Ciao.
Giuseppe
Hi @gcusello ,
If I look at the config of the system, from what i can see, when they configured it for SSL they simply re-used port 9997 for SSL on the indexer (rather then leaving 9997 for HTTP and using 9998 for SSL). I'm assuming i'd have to configure HTTP on 9998 to test this?
Looking at outputs.conf (using splunk btool server list --debug) on the troublesome Universal Forwarder (UF), I can see that the following lines are all taken from my indexer_config app and so all of these should be common across all UFs:
[tcpout:splunkssl]
clientCert=$SPLUNK_HOME/....
compressed=trueserver=<indexer_ip>:9997
sslPassword=<encrypted_password>
sslRootCAPath=$SPLUNK_HOME/....
sslVerifyServerCert=true
so sslPassword should be ok.
Using "splunk btool server list --debug" command on server.conf is showing that the pass4SymmKey parameter is getting set by server.conf in system\local and so this password may be unique to this UF. From what i've read online this is used for authentication between hosts. By default it isn't used by default for auth between deployment server/client (i don't think?) but is it used by default for authenticating a universal forwarder to an indexer? (a.k.a could this be why the indexer is forcibly closing connections on 9997?).
Are there any other passwords i should check on the Universal Forwarder?
Regarding your question about searching the indexer _internal index for logs from this UF. I did a search over the last 30 days. During this time i can see a brief burst of logs when the UF was first installed on the server and can see another brief burst of logs a week later when the server was restarted (OS rebooted). In the main no other logs are being ingested into Splunk from the UF. Interestingly I did a "splunkd restart" today but have only seen a single log entry ingested today (from splunk.version on the UF).
UF's pass4SymmKey is used for authentication only if you are using indexer discovery to get list of current indexers from CM. If/when you have defined those indexers on outputs.conf that pass is not used anywhere. If I understood correctly from your previous post you have indexer_config app which set those indexers on outputs.conf.
Was that sslPassword encrypted on your app on DS or was it clear text there and it was encrypted on UF side on installation time? If first one, then you must have a same splunk.secret on UF than what is on all other nodes, especially in that node where that password was encrypted. If it is on clear text on DS then splunk.secret can be anything.
r. Ismo
Hi @mike_k,
about the port you're using, you can choose the one you like: 9997 or 9998, obviously it must be the same both on Indexers and on Clients, you could also have some clients using SSl and some clients not using SSL, but they must have different ports.
In other words, you have to put SSL configurations in 9997 (or 9998, the one you're using) input on Indexers and in the same output on Clients.
About your configurations they seem to be OK, obviously you inserted in the file the clear password and Splunk encrypted it after the first restart, is it correct?
Check the configurations on Indexers: I suppose that you enabled SSL on Indexers.
The documentation about this is at
https://docs.splunk.com/Documentation/Splunk/8.2.4/Security/Aboutsecuringdatafromforwarders
About the question about other password, the only relevant pasword is the one for SSL.
Finally, about the logs on _internal, I suppose that the first logs were received when you installed the Forwarder before the SSL configuration; it's strange that you received logs after the SSL configuration.
For these reasons I hint to disable SSL and check if the connection and input is OK, then you can check SSL.
Ciao.
Giuseppe
Hi
stupid question, but you have restarted those UFs after app has installed?
r. Ismo
Hi @isoutamo
Looking back, the OS of the server was restarted roughly a week after it had been initially installed.
Just to be on the safe side i logged on and did a "Splunk restart" on the troublesome Universal Forwarder today, however am still getting the same results.