Splunk Search

How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

Jaravuy
New Member

How can I do prediction with the different algorithms like Clustering, Sequence Clustering, etc in Splunk?
Splunk uses Kalman filter, but i need to try with different algorithms.

Can anyone help.

0 Karma
1 Solution

aljohnson_splun
Splunk Employee
Splunk Employee

In response to your comment about unsupervised learning, there are two commands you might find useful.

kmeans

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Note that kmeans only works with numeric fields. Example:

... | kmeans k=4 disttype=cosine count

cluster (see wiki: agglomerative clustering)

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster groups events based on the contents of the _raw field. The default grouping method is to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to cluster_count and cluster_label. The cluster_count value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the cluster_label value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Note that cluster only works with textual data. It's actually what powers the patterns tab.


As @jeffland mentions there are a number of algorithms available for use in the ML Toolkit. At the time of writing, BIRCH, DBSCAN, SpectralClustering, and KMeans are all available for unsupervised tasks. Check out the docs for the ML Toolkit as well.

View solution in original post

aljohnson_splun
Splunk Employee
Splunk Employee

In response to your comment about unsupervised learning, there are two commands you might find useful.

kmeans

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Note that kmeans only works with numeric fields. Example:

... | kmeans k=4 disttype=cosine count

cluster (see wiki: agglomerative clustering)

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster groups events based on the contents of the _raw field. The default grouping method is to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to cluster_count and cluster_label. The cluster_count value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the cluster_label value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Note that cluster only works with textual data. It's actually what powers the patterns tab.


As @jeffland mentions there are a number of algorithms available for use in the ML Toolkit. At the time of writing, BIRCH, DBSCAN, SpectralClustering, and KMeans are all available for unsupervised tasks. Check out the docs for the ML Toolkit as well.

jeffland
SplunkTrust
SplunkTrust

You want to check out the Machine Learning App - it has some interesting cases, and it shows you what you can currently do with splunk in that regard. It may not fulfill all your needs, but I believe it's work in progress, so there's more to come (I'm looking forward to it as well).

Jaravuy
New Member

How to do Unsupervised Learning in Splunk.

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...