We are evaluating Splunk 4, and one of the interests from our managment team is to know if Splunk can assist us with collecting specific event log data from 11000 windows XP devices.
The purpose, for example, would be to identify all devices that have not logged a reboot event in the past 7 days allowing us to alert off of that information or to alert off of disk errors.
From what I read, Splunk 4 could use WMI to get the event logs, not sure if its able to get just specific types of events yet...
Is this something we could realistically do with minimal impact to the network and devices? Given the numbers would we need multiple splunk servers at various sites collecting the event logs and then forwarding them up? If so, anyone have an idea on approximatley how many Splunk forwarders would be needed to accomodate 11000 devices? Most geographical sites have about 2,000 devices each.
Is there a better solution I should be looking at that will collect the data and feed it into Splunk for analysis?
To Splunk, Windows Event Logs are just another kind of event. Yes you can place any sort of windows event log data into splunk, and can create searches relatively easily to achieve the sorts of goals you outline.
Splunk natively installed on a system as a forwarder can perform windows event log data collection and forwarding. Splunk can acquire all events from the logs. This would involve installing splunk on the windows XP devices. If you were to pursue this approach, I would guesstimate that you would want at least around 1 receiving splunk node per 1000 forwarders, purely to manage the number of open network connections. Forwarders can be configured to automatically distribute themselves across a group or to send to specific nodes (for the regional office case/goal).
Alternatively, as you identify, Splunk can acquire the data over WMI. Similarly this is not restricted in the types of events acquired. The WMI subsystem provided by windows consumes memory proportional to the number of hosts. Because the behavior is typically memory limited, the ram available on the WMI-pollers is an important criteria.
For a rough datapoint, one customer with 16GB WMI-pollers is servicing approximately 120 hosts per WMI-poller. Another customer was able to achieve higher numbers, closer to 300. Because the limiting factors are in a operating system subsystem, we're still learning about the scaling along factors such as data volume, network speed, etc. We've seen cases where the WMI subsystem is a bit brittle when overloaded so it's generally desirable to run it below capacity after testing to determine the capacity in your environment.
Historically, some customers would use Snare to acquire windows event log data and send it to Splunk over syslog. From a splunk perspective, this is a bit more work, and the available data in the events is a bit less, but the point is that other means to transmit windows event log data to Splunk are also viable, should you have other means already in place to accomplish this.
Ok - here is a different route...
We have Microsoft SCOM (well - we will do - currently migrating from MOM) installed. It has agents on all the windows systems that collect data/events/etc and forward to a central SCOm system.
I see from here http://www.splunkbase.com/apps/All/4.x/app:System+Center+Operations+Manager+%28SCOM%29+integration that I can get data from SCOM into Splunk.
However I was under the impression that only certain events are forwarded to SCOM for an alert depending upon what you set up.
Does anyone know if SCOM can collect all the events which can then be sucked in by Splunk? Or is it only some of them?
In which case I need to install Snare as well as Scom agents on all the platforms...
Matt
I moved from this to a separate question here:
http://answers.splunk.com/questions/4785/windows-event-log-collection-via-microsoft-scom-2007
Basically it looks like if I want all the events in there then I have to use a Splunk install as a forwarder - with the option of making it a Light Forwarder after it's configured.
There are a number of tools which run on windows and convert events to syslog messages.
Two I'm aware are SNARE and LogLogic Lasso.
Then your events can be piped into Splunk as syslogs, you should be able to perform some filtering before they are sent too to reduce noise..
EDIT: Apologies Jrodman, just noticed you had already mentioned snare.
Hi rictersmith!
Deciding which event logs to collect is done by picking the event log channel(s) (Security, System ...) during the setup of Splunk. You can filter further by even id and event type by using transforms. There are several questions/answers related to that topic.
To Splunk, Windows Event Logs are just another kind of event. Yes you can place any sort of windows event log data into splunk, and can create searches relatively easily to achieve the sorts of goals you outline.
Splunk natively installed on a system as a forwarder can perform windows event log data collection and forwarding. Splunk can acquire all events from the logs. This would involve installing splunk on the windows XP devices. If you were to pursue this approach, I would guesstimate that you would want at least around 1 receiving splunk node per 1000 forwarders, purely to manage the number of open network connections. Forwarders can be configured to automatically distribute themselves across a group or to send to specific nodes (for the regional office case/goal).
Alternatively, as you identify, Splunk can acquire the data over WMI. Similarly this is not restricted in the types of events acquired. The WMI subsystem provided by windows consumes memory proportional to the number of hosts. Because the behavior is typically memory limited, the ram available on the WMI-pollers is an important criteria.
For a rough datapoint, one customer with 16GB WMI-pollers is servicing approximately 120 hosts per WMI-poller. Another customer was able to achieve higher numbers, closer to 300. Because the limiting factors are in a operating system subsystem, we're still learning about the scaling along factors such as data volume, network speed, etc. We've seen cases where the WMI subsystem is a bit brittle when overloaded so it's generally desirable to run it below capacity after testing to determine the capacity in your environment.
Historically, some customers would use Snare to acquire windows event log data and send it to Splunk over syslog. From a splunk perspective, this is a bit more work, and the available data in the events is a bit less, but the point is that other means to transmit windows event log data to Splunk are also viable, should you have other means already in place to accomplish this.
BTW, the windows team jumped in with some corrections. The per-category resource use is a thing of the past.
Awesome, thanks for the details, and just in time for my meeting with the companies Director in 12 minutes.