So working with Splunk on this issue, it came down to two issues with the Splunk UF, the way it currently is designed and does things. Firstly, when the Splunk Service is starting, if it can't get a response from the Event Log within 30 seconds, it stops trying to collect Windows Events until the service is restarted. I have found that this can happen at times when a server is rebooted and applying patches, on the final reboot, it can be delayed. At this point Splunk Service starts, but if it times out, then you'll get no data collection of the Windows Event logs, as there is currently no auto retry function, if it doesn't respond in 30 seconds. The workaround is to change the Splunk UF service to Automatic Delayed start, to try overcome this issue. The second issue is to do with the Windows Event Log capture directive evt_resolve_ad_obj=1. If for some reason the Splunk UF needs to resolve an AD SID, that is not cached already, and the resolving of the SID times out - maybe say a Domain Controller was rebooting at the time of a resolve, or something like that, then the Splunk UF will stop capturing any more Events for that Event Log until the Splunk UF is restarted, once more it doesn't auto retry, or continue on to the next Event entry. The work around is to set evt_resolve_ad_obj=0 - so it doesn't try resolve any unknown SID's. You won't know this has occurred unless you are monitoring your data sets in the indexes for each host, checking to see if the Event Log data is arriving or not. Splunk informed us that the behaviors we are experiencing are due to the current design of the product. To fix these, it would come under enhancement requests. The case technician has submitted two feature requests on our behalf: 1. EID-I-2424: Implement a retry mechanism or allow configurable timeout settings to address the 30-second initialization timeout for Windows event log data collection in Splunk Universal Forwarder. https://ideas.splunk.com/ideas/EID-I-2424 2. EID-I-2425: Enhance the `evt_resolve_ad_obj=1` setting to skip or retry unresolved Security Identifiers (SIDs) instead of halting event log collection when SID resolution fails. https://ideas.splunk.com/ideas/EID-I-2425
... View more