Splunk Search

How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

Jaravuy
New Member

How can I do prediction with the different algorithms like Clustering, Sequence Clustering, etc in Splunk?
Splunk uses Kalman filter, but i need to try with different algorithms.

Can anyone help.

0 Karma
1 Solution

aljohnson_splun
Splunk Employee
Splunk Employee

In response to your comment about unsupervised learning, there are two commands you might find useful.

kmeans

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Note that kmeans only works with numeric fields. Example:

... | kmeans k=4 disttype=cosine count

cluster (see wiki: agglomerative clustering)

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster groups events based on the contents of the _raw field. The default grouping method is to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to cluster_count and cluster_label. The cluster_count value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the cluster_label value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Note that cluster only works with textual data. It's actually what powers the patterns tab.


As @jeffland mentions there are a number of algorithms available for use in the ML Toolkit. At the time of writing, BIRCH, DBSCAN, SpectralClustering, and KMeans are all available for unsupervised tasks. Check out the docs for the ML Toolkit as well.

View solution in original post

aljohnson_splun
Splunk Employee
Splunk Employee

In response to your comment about unsupervised learning, there are two commands you might find useful.

kmeans

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Note that kmeans only works with numeric fields. Example:

... | kmeans k=4 disttype=cosine count

cluster (see wiki: agglomerative clustering)

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster groups events based on the contents of the _raw field. The default grouping method is to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to cluster_count and cluster_label. The cluster_count value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the cluster_label value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Note that cluster only works with textual data. It's actually what powers the patterns tab.


As @jeffland mentions there are a number of algorithms available for use in the ML Toolkit. At the time of writing, BIRCH, DBSCAN, SpectralClustering, and KMeans are all available for unsupervised tasks. Check out the docs for the ML Toolkit as well.

jeffland
Champion

You want to check out the Machine Learning App - it has some interesting cases, and it shows you what you can currently do with splunk in that regard. It may not fulfill all your needs, but I believe it's work in progress, so there's more to come (I'm looking forward to it as well).

Jaravuy
New Member

How to do Unsupervised Learning in Splunk.

0 Karma
Get Updates on the Splunk Community!

There's No Place Like Chrome and the Splunk Platform

Watch On DemandMalware. Risky Extensions. Data Exfiltration. End-users are increasingly reliant on browsers to ...

The Great Resilience Quest: 5th Leaderboard Update

The fifth leaderboard update for The Great Resilience Quest is out >> 🏆 Check out the ...

Devesh Logendran, Splunk, and the Singapore Cyber Conquest

At this year’s Splunk University, I had the privilege of chatting with Devesh Logendran, one of the winners in ...