We have seen several cases where a syslog message (via UDP) is sent to our Splunk server, but never shows up in the search, or only shows up an hour or more later. In the latter case, I am guessing there is an issue with the performance of the jobs doing the indexing.
In a recent case where a syslog message never showed up in the Splunk search, I took a packet capture from the switch to which the server is directly connected and confirmed the syslog message was sent to the server, and there are no dropped packets or errors listed on the server's network interface counters.
Can anyone provide advice (or documentation) on how I can troubleshoot these two issues? Is there a way to monitor the jobs doing the indexing and index-time data lookups? Is there any record of what messages have been received by Splunk's syslog server process that I can check to see if this missing message was received or what might have caused it to be dropped?
My colleague found a bit more detail on what is going on here. It seems that in these cases, the syslog message is sent from the device and received by the server, but does not make it to the job doing the indexing. After additional syslog messages are sent from the same device, the missing one will show up (but the last one sent from that device may wind up missing until subsequent messages are sent).
From what I have read in the splunk documentation, it seems that perhaps a buffer is allocated for the stream of data coming from a given host, and only once that data is handed off to another thread is it segmented into discrete syslog messages. So I suspect that the last message does not occupy enough space in the buffer and sits there indefinitely until additional data from that host arrives.
Is anyone able to provide any additional guidance on this? We are planning on switching over to using a separate syslog server, but I would still like to understand this behavior.
Hi there @rberse
You might find the answers on this previous question useful for your case. It's more on best practices with ingesting syslog into Splunk than troubleshooting missing syslog messages, but something to consider.
Thanks for your response, @ppablo. I had a look and this is good to know. In this case, we are using splunk to listen directly on UDP 514 (and may change this to syslog-ng using TCP later). But it was running at the time, and I was able to confirm that the udp packet made it to the server.
Any suggestions on what data is logged that might be helpful in figuring out what happened?
No problem. Unfortunately, I'm not sure how to dig deeper into finding out what happened other than looking at Splunk internal logs around the time the syslog messages were sent. There are a good number of experts floating around Answers though, so I'm sure someone else in the Splunk community will come around to help provide other troubleshooting tips. I've just found that 90% of questions asked about this kind of issue on Answers usually end up pointing back to that previous Q&A 😛
Good luck and I hope you find an answer soon!
This blog post by @starcher is one of the answers on that post, but just wanted to highlight that since it's a pretty comprehensive overview on the topic: