Having this intermittent problem with UF on multiple servers where it occasionally fails to start up the WinEventLog component after a system restart. This is happening on a number of servers and we only started seeing this after upgrading them to Windows Server 2016. When the service starts it logs these two lines:
06-23-2019 04:44:20.122 +0000 ERROR ModularInputs - Unable to initialize modular input "WinEventLog" defined in the system context: Introspecting scheme=WinEventLog: script running failed (exited with code 255).
06-23-2019 04:44:19.575 +0000 ERROR ModularInputs - Introspecting scheme=WinEventLog: killing process, because executing it took too long (over 30000 msecs).
When this happens, other input modules will continue to read events. For example, _internal, stream and others data continues to get sent from this system, but nothing will be processed from the Event Log. Restarting the Splunk UF service on the server instantly fixes the problem, so I know it's not a problem with inputs.conf or anything else. It simply seems that some component fails to start up within 30 seconds and Splunk gives up on it. The fact that this happens intermittently on the same system (some restarts everything is fine and other times this happens) confirms this. Things I tried:
Changing the service to Delayed Start - No change. Found some obscure documentation that in Server 2016 Microsoft configured the services that get launched with Delayed Start to run with lowest priority. https://blogs.technet.microsoft.com/askperf/2008/02/02/ws2008-startup-processes-and-delayed-automatic-start/ . Relevant quote: "The Service Control manager also sets the priority of the initial thread for these delayed services to THREAD_PRIORITY_LOWEST. This causes all of the disk I/O performed by the thread to be very low priority."
Upgraded from 7.1.3 to 7.2.x - No change
Ticket with support. There are no tune-able parameters for this. Turning on debug logging for this module "category.ModularInputs=DEBUG" did not reveal any additional helpful information.
Only idea i have left is to brute-force this and add a scheduled task to restart the service 10-15 minutes after a system restart, but before I do this, any suggestions from the community?
... View more