Re: Using cluster to find anomalous events

skippylou · ‎11-13-2010

Trying to figure out if this is possible.

Many times I do a search similar to:

host=somehosts* earliest=-1d | cluster

to find what new types of events/messages I may see.

What I'm trying to figure out is if there is a way to save that 'cluster state' so that if I run the same command tomorrow or whenever, it filters out the matches that I would have already seen. Essentially only showing new clustered messages/events each time I run it.

carasso · ‎01-07-2011

Here's one I came up with. The idea is to cluster events over the last N hours, and then see if there are any clusters that only consist of events in the last 5 minutes. Those are your new type of events -- they don't look like anything seen in the last N hours.

Search back 10 hours; cluster results; for each cluster, keep the all _raw values, the oldest time, and the size; now, keep only those clusters that the oldest event in it was newer than 5 minutes; sort by size to get the smallest new clusters first.

earliest=-10h | cluster t=0.7  labelonly=t showcount=t  
| stats values(_raw) as raw last(_time) as time last(cluster_count) as size by cluster_label 
| eval minute5=now()-5*60  
| where time > minute5 
| sort size 
| fields size, cluster_label, raw

You can run this as an alert to run every 5 minutes. You can tweak the initial hours and the 5 minute range as needed, and the t=0.7 value as well. If there are too many new clusters, increase the 10 hours window to prevent false positives of events that occur, for example, every 15 hours. If there are still too many new clusters, decrease the value of t (e.g., 0.6), so that clustering is more loose, and only more radically different events will be noted.

Hope that helps.

adoshi · ‎10-27-2013

David, should the last(_time) be first(_time) because you are looking for clusters that have formed in the last 5 minutes.

sideview · ‎11-16-2010

This is somewhat abstract but you can keep a canonical list of things in a lookup (really just a csv file), and then manage it by using

inputlookup <filename>

outputlookup <filename>

and you can create scheduled searches to constantly append or prune or what have you.

The data you'll get in cluster is a little different, but in other more normal situations there are some relatively stable fields in your search, and those fields will also get preserved in the lookup table, so that you can basically do the following search to continually merge new data into the lookup:

<your search> | append [ inputlookup myCanonicalLookup ] | stats first(foo) as foo first(bar) as bar by someStableUniqueField | outputlookup myCanonicalLookup

and you then do things like this to filter out all the old stuff and get down to only what's new.

<your search> | append [ inputlookup myCanonicalLookup | eval hasBeenSeen="true" ] | stats first(foo) as foo first(bar) as bar first(hasBeenSeen) as hasBeenSeen by someStableUniqueField | search hasBeenSeen!="true"

you can probably use dedup in the same manner if you're more familiar with dedup. I'm generally taking sums in there too so I use stats. And sometimes i just use stats first(*) as * by foo.

skippylou · ‎11-17-2010

Thanks Nick. I'll mess around with this a bit to give it a try.

Using cluster to find anomalous events

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!