What is the best approach to Windows Event Log alerting?


I would like to know if anyone is using Splunk as the primary alerting engine for Windows Event Logs. We several hundred servers that index the Windows Event Logs or each server, but we currently have no alerting setup for concerning events. I can envision that eventually we may want to have several hundred "alert conditions", but am unsure of the best way to do this. My thought is that we setup a scheduled search every x minutes (5 perhaps?) where the match condition for the search is a lookup table of event IDs. Then the output of the lookup table would include a field that includes some description of what the problem may be. Is anyone doing anything similar? Does anyone have any other ideas on how to use Splunk as an effective Alerting tool, not just an awesome serach engine for errors?

0 Karma


Yes, there are lots of us that drive alerting off of splunk, or through splunk to another application. But your idea for an overall architecture is probably too much of a one-size-fits-all. Just because there exists an event -- let's say a password failure -- doesn't mean that there is anything wrong.

Not all alerts are created equal. Think of an alert as a report that you want someone to receive whose purpose is to draw their attention to something. The urgency of the "something" -- and also the specificity of the "something" as being out of the normal range --will determine the frequency you need to alert on.

1) Tell me within a day if the previous day we reached over 80% disk usage on any machines in group XYZ.
2) Tell me within X hours (or YY minutes) if a host becomes unavailable or stops reporting.
3) Tell me monthly if any user goes over X hours of working from home , or if any user visits certain types of websites more than X times in the 30 day period.
4) Tell me within X minutes (or YY seconds) if the number of event X per second goes outside of normal bounds.

Sometimes there will be events that are probably just normal goofs, but over the medium term you have to check for patterns to make sure it's not some kind of attack (whether outsider or insider). Not this second or minute, but also not a month from now.

@adonio's suggestions are right-on. You will build up your knowledge base and your arsenal of alerts over time, creating them and adjusting them and retiring them over time as your business cases dictate and what is "normal" for your organization changes.

Stick to the "Agile" way of thinking for now. Implement early and often, and adjust and reorient as you learn. Don't worry about getting it perfect, done is better than perfect, and splunk is easy enough to modify as you go along.

0 Karma

Ultra Champion

hello rnavis,
first build a search that matches the KPI, SLA, threshold you want to be alerted on. then setup an alert by saving as alert.
there are almost 100 pre built reports and searches in the App for Windows Infrastructure, it can be a good place to start looking for references searches and reports. more on this app, here:

p.s. there are many other windows data related apps in splunkbase. download and explore searches

hope it helps

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...