Splunk Search

Using cluster to find anomalous events

skippylou
Communicator

Trying to figure out if this is possible.

Many times I do a search similar to:

host=somehosts* earliest=-1d | cluster

to find what new types of events/messages I may see.

What I'm trying to figure out is if there is a way to save that 'cluster state' so that if I run the same command tomorrow or whenever, it filters out the matches that I would have already seen. Essentially only showing new clustered messages/events each time I run it.

carasso
Splunk Employee
Splunk Employee

Here's one I came up with. The idea is to cluster events over the last N hours, and then see if there are any clusters that only consist of events in the last 5 minutes. Those are your new type of events -- they don't look like anything seen in the last N hours.

Search back 10 hours; cluster results; for each cluster, keep the all _raw values, the oldest time, and the size; now, keep only those clusters that the oldest event in it was newer than 5 minutes; sort by size to get the smallest new clusters first.

earliest=-10h | cluster t=0.7  labelonly=t showcount=t  
| stats values(_raw) as raw last(_time) as time last(cluster_count) as size by cluster_label 
| eval minute5=now()-5*60  
| where time > minute5 
| sort size 
| fields size, cluster_label, raw 

You can run this as an alert to run every 5 minutes. You can tweak the initial hours and the 5 minute range as needed, and the t=0.7 value as well. If there are too many new clusters, increase the 10 hours window to prevent false positives of events that occur, for example, every 15 hours. If there are still too many new clusters, decrease the value of t (e.g., 0.6), so that clustering is more loose, and only more radically different events will be noted.

Hope that helps.

adoshi
Explorer

David, should the last(_time) be first(_time) because you are looking for clusters that have formed in the last 5 minutes.

0 Karma

sideview
SplunkTrust
SplunkTrust

This is somewhat abstract but you can keep a canonical list of things in a lookup (really just a csv file), and then manage it by using

inputlookup <filename>

outputlookup <filename>

and you can create scheduled searches to constantly append or prune or what have you.

The data you'll get in cluster is a little different, but in other more normal situations there are some relatively stable fields in your search, and those fields will also get preserved in the lookup table, so that you can basically do the following search to continually merge new data into the lookup:

<your search> | append [ inputlookup myCanonicalLookup ] | stats first(foo) as foo first(bar) as bar by someStableUniqueField | outputlookup myCanonicalLookup

and you then do things like this to filter out all the old stuff and get down to only what's new.

<your search> | append [ inputlookup myCanonicalLookup | eval hasBeenSeen="true" ] | stats first(foo) as foo first(bar) as bar first(hasBeenSeen) as hasBeenSeen by someStableUniqueField | search hasBeenSeen!="true"

you can probably use dedup in the same manner if you're more familiar with dedup. I'm generally taking sums in there too so I use stats. And sometimes i just use stats first(*) as * by foo.

skippylou
Communicator

Thanks Nick. I'll mess around with this a bit to give it a try.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...