I am a newbie to Splunk and exploring its abilies to perform complex transation matching in order to report/alarm upon the current state (up, down, degraded) of multiple entities. I am not clear how much of the following could be achieved using Splunk alone or would require a bolt on extra (any suggestions welcome) apart from say Nagios to do the actual reporting.
Specifically I would see within a syslog stream the following sequence of messages:
Message from A that the tunnel to B is down
Message from A that the tunnel to C is down
Message from B that tunnel to A is now up
Message from A that tunnel to B is now degraded
Splunk seems to have the pattern matching capability to identify a transaction (in this case tunnel state)
but I am not clear how whether you would have to feed the results into a separate state engine to track current state. Also unclear how these queries would be prepared and run. Can Splunk automatically index multiple transactions based upon certain rules without explicitly having to specify say the triplet which describes each and every entity?
There are two techniques that you can use to track system state: the dedup command and lookup tables.
To use the dedup command, assume that your data here has three fields: from, to, state. Your search would look like: ... | dedup from to
The result set would be the most recent message for a given (from, to) pair and would represent the current system state.
In the worst case, however, this search would have to run over ALL data to assemble the system state. The way to speed this up is to periodically, using a saved search, persist the current state, as well as the timestamp of the last change, into a lookup table (using outputlookup). When retrieving the current state, use the dedup approach over a recent window of data, append the full lookup table and use stats to pick the newest version of every state variable. This is the same recipe for persisting the state back to the lookup table.