Missing events syslog forwarding to heavy forwarde...

w199284 · ‎06-07-2021

I need help troubleshooting an issue where I am missing events being forwarded from a linux syslog daemon to my heavy forwarders. Beginning the first day of each month, for three or four days, this feed drops from ~50,000 indexed events per hour to maybe ~150. Then, magically, the feed resumes ~50,000 events per hour for the remainder of the month. Only this one index source is affected. All traffic is UDP.

To troubleshoot:

I've removed the load balancer from the equation and send directly to one heavy forwarder
We can see the syslog events leaving the source server
Using tcpdump, I can see events from the source server hitting port 514 on the heavy forwarder
I have a dashboard showing blocking on agg, index, parsing and typing queues. There is none.
I tested the regexs in my transforms on the actual events captured with tcpdump. All test correctly.

While this event was in progress

Opened a support case with diag logs from the HWF and one of my indexer servers (nothing yet)
There are no errors or warnings in the internal logs for the heavy forwarder used in this test.
I've looked at all the log channels on the HWF (1236 of them) but I don't know which one(s) to elevate the logging level for
I tried starting splunk with --debug but I do not see any additional internal logging. I may not have done this correctly. (splunk start --debug)

More strange

There are two syslog feeds from the source server being used in this troubleshooting effort. The second feed is unaffected.
There are 78 source servers in this group. All exhibit the same behavior making it seem that splunk is the common denominator.

I do use a props and transforms configuration for port 514 to parse the index name and sourcetype for a multitude of incoming syslog feeds bound for different indexes. This configuration has not changed for a very long time (and does not change at the first of the month - for a few days).

Frankly I'm lost. There must be a way to expose what is happening to these events either at the heavy forwarder or on the indexers but I'm out of ideas. Does anyone have a thought about how I might capture the information I need to diagnose whatever is happening? At this time, the feed has returned to normal i.e. ~50,000 indexed events per hour. Thank you in advance for any advice you have.

Missing events syslog forwarding to heavy forwarder

heavy forwarder

source

Fall Into Learning with New Splunk Education Courses

Super Optimize your Splunk Stats Searches: Unlocking the Power of tstats, TERM, and ...

How Splunk Observability Cloud Prevented a Major Payment Crisis in Minutes

Are you a member of the Splunk Community?