Solved: monitoring hundreds of metrics and/or configuratio...

Justin_Grant · ‎01-25-2010

I'm trying to use Splunk to monitor both runtime metrics and configuration state of a server application like JBoss or SQL Server. My goal is to test each application against a set of "known good" tests and report on which server software is out of compliance.

If I just wanted to monitor one metric (e.g. free disk space > 10%) or one configuration setting (e.g. the value of a particular Windows registry key) this would be easy in Splunk via a saved search.

But operations teams can sometimes document a "known good" definition which can include tens or even hundreds of checks for one piece of server software.

What's the most efficient way using Splunk to do hundreds of different tests against logged metrics or configuration data? I assume that having hundreds of saved searches is not the best way to do this. 🙂

gkanapathy · ‎01-26-2010

There's probably a better way this could be built into Splunk, but here's how I would go about it.

First, let's assume that every config is just a file, and it's all on one known host. Then I would set up fschange monitoring on all the relevant files on the "good" server, and have it generate a hash (set hashMaxSize large enough to include all your files).

Next, we schedule and run a search to generate a lookup table:

host=goodhost sourcetype=fs_notification | rename hash as goodhash | dedup path | fields path,goodhash | outputlookup goodconfig.csv

On the clients, we also fschange the relevant files. Then we could do:

host!=goodhost sourcetype=fs_notification | dedup host,path | lookup goodconfig.csv path OUTPUT goodhash | where hash!=goodhash

This assumes the file paths are the same, though you could use "eval" to appropriately do string transformations as needed.

Now this could be extended to other sorts of objects other than files, as long as we had a way to create a listing of object IDs, corresponding to the file path in the case of files, that match between the "good" source and the "questionable" source, and a way to generate a key on the object contents (such as the fschange input or a custom scripted input). And I suppose in general, if you used consistent field names (e.g., objectPath, goodHash) when writing out the "check" results, and just a limited number of sourcetypes, then it should be easy to keep the lookup table generation and search quite simple.

View solution in original post

gkanapathy · ‎01-26-2010

There's probably a better way this could be built into Splunk, but here's how I would go about it.

First, let's assume that every config is just a file, and it's all on one known host. Then I would set up fschange monitoring on all the relevant files on the "good" server, and have it generate a hash (set hashMaxSize large enough to include all your files).

Next, we schedule and run a search to generate a lookup table:

host=goodhost sourcetype=fs_notification | rename hash as goodhash | dedup path | fields path,goodhash | outputlookup goodconfig.csv

On the clients, we also fschange the relevant files. Then we could do:

host!=goodhost sourcetype=fs_notification | dedup host,path | lookup goodconfig.csv path OUTPUT goodhash | where hash!=goodhash

This assumes the file paths are the same, though you could use "eval" to appropriately do string transformations as needed.

Now this could be extended to other sorts of objects other than files, as long as we had a way to create a listing of object IDs, corresponding to the file path in the case of files, that match between the "good" source and the "questionable" source, and a way to generate a key on the object contents (such as the fschange input or a custom scripted input). And I suppose in general, if you used consistent field names (e.g., objectPath, goodHash) when writing out the "check" results, and just a limited number of sourcetypes, then it should be easy to keep the lookup table generation and search quite simple.

gkanapathy · ‎01-27-2010

My presumption was that the configs are pushed out from the "main" server, and that any difference is significant. If this is not the case, the generalized solution of using a scripted input rather than fschange would allow you to use a hashing function that (e.g.) ignores whitespace or normalizes line-endings or whatever.

V_at_Splunk · ‎01-26-2010

Clever, I like it! Breaks unless the config files are generated programmatically with an absolutely consistent format, though.

V_at_Splunk · ‎01-25-2010

Is it possible to treat the entire dumped config state, as 1 multiline event? (I.e., are the lines consecutive?) If so, a single (admittedly hairy) saved search will then provide a binary pass/fail answer, or even a numeric "how many checks failed" answer or string "which checks failed" answer with EVAL ... if(X,Y,Z).

monitoring hundreds of metrics and/or configuration items: how to do efficiently?

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

monitoring hundreds of metrics and/or configuration items: how to do efficiently?

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases