Hi,
We recently upgraded the Heavy Forwarders (HF) of our Splunk Enterprise. After the upgrade the Universal Forwarders stopped sending data (e.g. Linux logs) to HFs over Http, the logs are not searchable on Search head.
We upgraded from v9.1.2 to 9.3.0. We also tried 9.3.1 which did not make any difference - logs are not being sent.
v9.2.3 works without issues.
I checked the logs on UF on v9.3.x and can see
ERROR S2SOverHttpOutputProcessor [8340 parsing] - HTTP 503 Service Unavailable
However I cannot figure out what causes the issue. Telnet from UF to HF works, Telnet form HF to indexers also work. The tokens on the Deployment server and UFs are the same.
Please, advise
Ok. So you have the logs from UFs but did you check splunkd.log on those HFs?
Yes, I checked the splunkd.log on HFs. Could not see anything relevant/useful
It's unlikely but not impossible that your particular setup triggers some bug in the software.
What I would do:
1) compare pre- and post-upgrade configs to verify if anything changed
2) do a fresh reinstall of 9.1 where your 9.3 wasn't working and reapply the config
3) If you have the means, try to spin up a fresh indexer with a http input and point that UF to the new indexer.
If no obvious reason pops up just raise a case with Splunk support.
What OS do you have these installed on?
The Splunk nodes including heavy forwarders are on Linux RHEL8, the universal forwarders are mainly on Linux.
Is it sending too much data including its own logs? I think endpoint server is busy, Did you try sending a small batch of events to test on one of those linux servers?
Review this settings if you haven't:
Check outputs.conf
Verify inputs.conf
https://docs.splunk.com/Documentation/Splunk/9.3.1/Admin/Outputsconf#HTTP_Output_stanzas
If this Helps, Please UpVote.
Thanks for your reply @sainag_splunk
I have done some tests and checks.
For the load, I do not think it is too much data, I increased the number of heavy forwarders from 2 to 4 it did not make any change.
For the TLS/SSL,
The instance with the UF supports
SSLv3
TLSv1
TLSv1.2
TLSv1.3
The load balancer (LB) (the HF are behind the LB) support TLS 1.2 and 1.3
To eliminate the LB I pointed the UF directly to the HF by changing the outputs.conf as follows
uri = http://<ip-of-hf>:8088
It did not work in the environment with UF v9.3.1 and HF v9.3.1 , with the same error.
Telnet from UF to HF on port 8088 worked
However this (direct to HF) worked in the environment with UF v9.3.1 and HF v9.1.2
Also I noticed that the restart of UF in the environment with the problem is very slow, it takes 4-5 minutes, In the environment with no issues it takes a couple of seconds.
Output and input configs look similar.