Hello Team,
Few of our HF was configured to sent logs to syslog ng - local server for logs storage. After upgrade the certification on those forwarders, logs stop coming into Splunk. Its working fine on forwarders that not configured to sending data to syslog ng.
We tried to remove the syslog ng config from the HF setting but still no data coming in.
Any idea/thought on this? Maybe anyone had similar issue previously. Is there any cert upgrade needed on the syslog ng server as well?
Thank in advance.
Muhammad Murad
Thank you everyone. The issue resolve after i removed the syslog ng configuration under local directory. Means all forwarders that have syslog ng/additional output.conf will having issue after certificate upgrade. Removed the config, logs are flowing well. What we are going to do now is considering using DDSS instead of syslog ng output.
You typicaly don't use certs with simple syslog. So it's kinda confusing where the certs and encryption are used in your setup in the first place.
Check the usual suspects. Do a btool dump of the outputs config. Verify the network configuration. Do a tcpdump and see if anh packets are being sent. Do a tcpdump on the receiving server...
Agreed. That a reason we a little bit confusing why only forwarders that had a syslog ng config having issue after the cert upgrade. Others which dont have syslog ng setting was working fine.
We did checked the network communication, everything's looks good. Connection successful - established.
Will check on btool dump of the outputs config and tcpdump on both server to see any logs been sent/receive.
That's why I'd expect that is some coincidence with a completely different problem. Maybe someone tried to do some config change but didn't restart the splunkd process and now the change was finally applied along your cert change?
Potentially, but only 3 forwarders having these issue and only these 3 having the output.conf setting to forward data to syslog ng. all these 3 was under our team and very likely someone doing changes without our notification.
Tried to remove the config just to ensure also not solve the issue. Hence need any insight here.
Thank you.
OK. I re-read your initial question and I'm a bit confused. From what I understand you have some forwarders. They send (or at least are supposed to) send the events to your indexer(s). Three of them also have another output defined - a syslog one sending the events to your syslog-ng server.
And now what happened? You "upgraded the certification". What does that mean? Does it mean that your HF->idx connection used to be unencrypted and now you configured encryption or did you simply renew your certificates? And how broad was the change? Across all your forwarders? Did you change anything on the input side? (like configuring encryption or renewing the certificates if the traffic was already encrypted)
And what does work now and what doesn't? Because it's not very obvious - do the inputs work? Which outputs don't work? What do you have on those HFs in splunkd.log?
its happen after i renew the certificates as suggested by Splunk. All our forwarders need to download and renew the certificates in our forwarders. Other forwarders no issue, only 3 forwarders that had additional outputs.conf that sending to syslog ng server having issue.
Means once we renew the certificates on those forwarders (that had additional outputs.conf that sending to syslog ng), logs stop sending to Splunk Cloud.
No any changes made in input side. I believe from the UF side are sending data to HF, but HF not forward data to Splunk. Currently we re route the logs to others forwarders that not had syslog ng config, and we can see logs are coming as normal.
I am also a little bit confuse since we did check and confirm the connection between HF and syslog ng server are established. Its all happen right after renew the certificate.
Is there any additional config might needed for forwarders that having additional outputs.conf to syslog after renew the cert?
Internal logs are coming normally from HF to Cloud.
Hi
You probably have had some special configuration on your old outputs.conf which have handled that two way routing which you have set up manually? Have you just installed SC's certificate app or have you also updated your local outputs.conf to use those new certs?
Best way to check which outputs.conf settings are in use is to use
splunk btool outputs list --debug
That told to you what configurations (like certs) are in use and from which files those are defined.
r. Ismo
Hello,
Thanks. The syslog ng config was configured in outputs.conf under local as suggested by Splunk last time. We just renew the certificate, not doing anything on outputs.conf to use those new certs.
How/what need to update in outpts.conf to use new certs? Any links explained these? i can try cause we did not touch anything in the config. we just renew the certificates and issues happen.
Below are the btool result :
[splunk@ip-10-125-17-91 bin]$ /opt/splunk/bin/splunk btool outputs list --debug
/opt/splunk/etc/system/local/outputs.conf [indexAndForward]
/opt/splunk/etc/system/local/outputs.conf index = false
/opt/splunk/etc/system/default/outputs.conf [syslog]
/opt/splunk/etc/system/default/outputs.conf maxEventSize = 1024
/opt/splunk/etc/system/default/outputs.conf priority = <13>
/opt/splunk/etc/system/default/outputs.conf type = udp
/opt/splunk/etc/system/local/outputs.conf [syslog:kr_syslogng_group]
/opt/splunk/etc/system/local/outputs.conf server = 10.126.137.234:514
/opt/splunk/etc/system/local/outputs.conf type = tcp
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf [tcpout]
/opt/splunk/etc/system/default/outputs.conf ackTimeoutOnShutdown = 30
/opt/splunk/etc/system/default/outputs.conf autoLBFrequency = 30
/opt/splunk/etc/system/default/outputs.conf autoLBVolume = 0
/opt/splunk/etc/system/default/outputs.conf blockOnCloning = true
/opt/splunk/etc/system/default/outputs.conf blockWarnThreshold = 100
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf channelReapInterval = 60000
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf channelReapLowater = 10
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf channelTTL = 300000
/opt/splunk/etc/system/default/outputs.conf cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
/opt/splunk/etc/system/default/outputs.conf compressed = false
/opt/splunk/etc/system/default/outputs.conf connectionTTL = 0
/opt/splunk/etc/system/default/outputs.conf connectionTimeout = 20
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf defaultGroup = splunkcloud_20220309_2a3a6bb51c7c7db014655a134c893643
/opt/splunk/etc/system/default/outputs.conf disabled = false
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf dnsResolutionInterval = 300
/opt/splunk/etc/system/default/outputs.conf dropClonedEventsOnQueueFull = 5
/opt/splunk/etc/system/default/outputs.conf dropEventsOnQueueFull = -1
/opt/splunk/etc/system/default/outputs.conf ecdhCurves = prime256v1, secp384r1, secp521r1
/opt/splunk/etc/system/default/outputs.conf forceTimebasedAutoLB = false
/opt/splunk/etc/system/default/outputs.conf forwardedindex.0.whitelist = .*
/opt/splunk/etc/system/default/outputs.conf forwardedindex.1.blacklist = _.*
/opt/splunk/etc/system/default/outputs.conf forwardedindex.2.whitelist = (_audit|_internal|_introspection|_telemetry)
/opt/splunk/etc/system/default/outputs.conf forwardedindex.filter.disable = false
/opt/splunk/etc/system/default/outputs.conf heartbeatFrequency = 30
/opt/splunk/etc/system/local/outputs.conf indexAndForward = 1
/opt/splunk/etc/system/default/outputs.conf maxConnectionsPerIndexer = 2
/opt/splunk/etc/system/default/outputs.conf maxFailuresPerInterval = 2
/opt/splunk/etc/system/default/outputs.conf maxQueueSize = auto
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf negotiateNewProtocol = true
/opt/splunk/etc/system/default/outputs.conf readTimeout = 300
/opt/splunk/etc/system/default/outputs.conf secsInFailureInterval = 1
/opt/splunk/etc/system/default/outputs.conf sendCookedData = true
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf socksResolveDNS = false
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf sslPassword = $7$5EfPkE9EnHQx12YOSI1Kwga9fflT5fyblj/wzzHLgOdmxoHsfAbg0VQueyWoX11ovoWt1TIaefQfIoT/kZkGLUY3nqhb6doWv9h8xg267wL4egu0QWjXKT7WTt/j7sub
/opt/splunk/etc/system/default/outputs.conf sslQuietShutdown = false
/opt/splunk/etc/system/default/outputs.conf sslVersions = tls1.2
/opt/splunk/etc/system/default/outputs.conf tcpSendBufSz = 0
/opt/splunk/etc/system/default/outputs.conf useACK = false
/opt/splunk/etc/apps/100_amway_splunkcloud/local/outputs.conf useClientSSLCompression = true
/opt/splunk/etc/system/default/outputs.conf writeTimeout = 300
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf [tcpout:scs]
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf clientCert = $SPLUNK_HOME/etc/apps/100_amway_splunkcloud/default/amway_server.pem
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf compressed = true
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf disabled = 1
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf server = amway.forwarders.scs.splunk.com:9997
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf sslAltNameToCheck = *.forwarders.scs.splunk.com
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf sslVerifyServerCert = true
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf useClientSSLCompression = false
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf [tcpout:splunkcloud_20220309_2a3a6bb51c7c7db014655a134c893643]
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf clientCert = $SPLUNK_HOME/etc/apps/100_amway_splunkcloud/default/amway_server.pem
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf compressed = false
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf server = inputs1.amway.splunkcloud.com:9997, inputs2.amway.splunkcloud.com:9997, inputs3.amway.splunkcloud.com:9997, inputs4.amway.splunkcloud.com:9997, inputs5.amway.splunkcloud.com:9997, inputs6.amway.splunkcloud.com:9997, inputs7.amway.splunkcloud.com:9997, inputs8.amway.splunkcloud.com:9997, inputs9.amway.splunkcloud.com:9997, inputs10.amway.splunkcloud.com:9997, inputs11.amway.splunkcloud.com:9997, inputs12.amway.splunkcloud.com:9997, inputs13.amway.splunkcloud.com:9997, inputs14.amway.splunkcloud.com:9997, inputs15.amway.splunkcloud.com:9997
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf sslCommonNameToCheck = *.amway.splunkcloud.com
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf sslVerifyServerCert = true
/opt/splunk/etc/apps/100_amway_splunkcloud/default/outputs.conf useClientSSLCompression = true
[splunk@ip-10-125-17-91 bin]$
It seems that there are two different definitions for sending events to SC (Splunk Cloud).
Firstly, I'd check the splunkd.log on those HFs. And verify that the UFs are properly connecting to HFs.
Since internal logs are getting ingested properly, the connection between HFs and indexers must be working. So check the "previous" step in the event path. Furthermore, if it was only the output from HF to indexers, your syslog output should be working. If it doesn't - it highly suggests that your TLS change must have broken something "before" the HFs.
Thank you. I will arrange a time to get those UF connecting to problematic HF adn get the splunkd logs. due to this issue, we routing the logs to communicate using other HF.
TLS broken - any link explained further on this including the resolution step? I will also checking this points.
In order to resolve the problem you must first know what it is. For now we only know that _internal logs seem to be getting ingested properly from the "problematic" HFs which means it's most probably not an HF output issue. It might be an input issue. We don't know how your inputs are configured on those HFs and how your UFs are configured.