Getting Data In

Does anyone know of a few Windows event logs to monitor in Splunk for system crashes and errors?

Path Finder

Hi Splunkers!

I’d like to pick your brain to see if you know of 3-5 key windows event log events to monitor that would indicate a machine that has crashed or is having trouble with a particular component (application, hardware, driver, etc). I’m working on a set of alerts in Splunk for my program to assist with maintaining their uptime SLAs.

I’m looking to search in Splunk for a simple text string, event id, error code, or pattern that would indicate that a system has gone down or is degraded (i.e. something is failing).

I’ve done some research. Here’s what I’ve got so far:


System Log, Event ID: 41, Source: Microsoft-Windows-Kernel-Power

Description: The system has rebooted without cleanly shutting down first.
The kernel power event ID 41 error occurs when the computer is shut down, or it restarts unexpectedly
An unexpected reboot error appears in the log when the system fails to shut down and restart gracefully. A likely cause of this error is that the operating system stopped responding and crashed, or the server lost power.


System Log, Event ID: 6008, Source: EventLog

Description: “The previous system shutdown at on was unexpected.” This event id will let you know that the system started after it was not shut down properly.


System Log, Event ID: 18, Source: Microsoft-Windows-WHEA-Logger

Description: “A fatal hardware error has occurred.” This error indicates that there is a hardware problem


Application Log, Level: Error, Source: Application Error

Description: Tracking applications that have crashed or faulted on the system


System Log, Event ID: 7000, Source: Service Control Manager

Description: “The service failed to start due to the following error: ”. This error is logged when a service fails to start normally.

Any thoughts?

SplunkTrust
SplunkTrust

Hello there,
First i would like to say that i think you are on the right track and your research is valid and that you found good events.
windows logs are verbose and there is plenty to look for and see. having said that, I have seen many windows admins and ops guys, looking at different events for same or similar use cases.
before i continue, i will highly recommend to consult your windows admins / SMEs and ask them what do they see more often? what is important for them to be alerted at?
from many many sources online on this subject, these 2 links are pretty good. I choose those since they also add some security aspect to the mix:
http://www.redblue.team/2015/09/spotting-adversary-with-windows-event.html
http://www.redblue.team/2015/09/spotting-adversary-with-windows-event_21.html
from an operational perspective, i have seen that many times the WinHostMon (windows host monitoring) can be very useful as a source on top (or by itself) of system logs

hope it helps

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!