I am getting a strange thing happening. My Heavy Forwarder (CentOS 7 Linux) running Splunk 6.2.5 is periodically stopping forwarding the data that is being sent to it via a UDP:514 rsyslog stream.
Weirdly, the same Heavy Forwarder is still sending in its own data for the Linux App. When I restart the splunkd service, the forwarder starts forwarding again.
When I check the _internal index on the separate Search Head, I see no data at all for this Heavy Forwarder, though. Should I? Also, why is this HF stopping sending on data?
My "solution" was to set up an RSyslog server on the server with the Heavy Forwarder that output to files and then I use the HF to read the files.
It looks like, although the Splunk port listener is implemented in and supported by Splunk, the Best Practice is to set up a Syslog server to receive the data and then ingest the files in Splunk. The main advantages are that the Syslog server is pretty solid and it NEVER needs to be restarted (unlike Splunk, which needs to be restarted to apply configurations (and we get around issues like this)!
My first thought is have you guaranteed that at periods when you think it is not being forwarded it is actually getting there in the first place. UDP is not guaranteed delivery, and may end up dumped on the floor in times of congestion or failure.
You're better off using a proper Syslog sub-service over TCP, and locally ingesting the log files generated. (As it happens, this was also Splunk's official advice last time I looked.) My faithful recommendation is to use syslog-ng for its flexibility of configuration.
Thanks for your response.
It MAY not be sent in the first place, but we have a device that is sending a UDP:514 ping at the Heavy Forwarder every 5 seconds (we have a very paranoid Network Administrator) and before I restart the service, nothing is forwarded, but AS SOON as Splunkd is restarted, the UDP:514 pings start being logged. Either I am being very lucky each time that I restart splunk on the Heavy Forwarder to do it exactly the right time, or the data IS being received but the Forwarder is not forwarding on the data.
I did suggest using TCP and syslog-ng, but the Network Admins said that the extra overhead is not wanted (but strangely pinging every 5 seconds is not an issue - go figure?)
At the moment, I have an email Alert that runs every 5 minutes to check to see if we are ingesting any forwarded data from the Heavy Forwarder, however, I would like to know why the Splunkd is stopping forwarding the data and get Splunk to be reliable!
Also, I am only not getting the "ps" data when I see that I restarted splunk on that server, so why might I not be getting any _internal index data from the Heavy Forwarder, but I am getting constant "ps" information throughout the issues?
Well, again, you're assuming that the Heavy Forwarder is at fault in not forwarding. I'm still suspicious of the UDP service. It could be that the system is dropping it in favour of something else, with the service restart prompting a reset.
Interesting - a SMOKING GUN! I had a look through the $SPLUNK_HOME/var/log/splunk/splunkd.log (and .* files) and found that during the times when Splunkd was not forwarding on data, I saw some warnings:
splunkd.log:11-12-2015 13:00:13.144 +0000 WARN DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Mon Aug 10 13:14:20 2015). Context: source::udp:514|host::10.31.50.5|syslog|
I was only getting this error during the periods when I don't get any forwarded data! As soon as Splunkd is restarted, these warnings disappear!
To me, it seems unlikely that the logs from these devices are changing date format, in their log files, but Splunk is getting into trouble and generating these warnings.