Archive
Highlighted

How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

New Member

How can I do prediction with the different algorithms like Clustering, Sequence Clustering, etc in Splunk?
Splunk uses Kalman filter, but i need to try with different algorithms.

Can anyone help.

0 Karma
Highlighted

Re: How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

New Member

How to do Unsupervised Learning in Splunk.

0 Karma
Highlighted

Re: How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

Champion

You want to check out the Machine Learning App - it has some interesting cases, and it shows you what you can currently do with splunk in that regard. It may not fulfill all your needs, but I believe it's work in progress, so there's more to come (I'm looking forward to it as well).

Highlighted

Re: How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

Splunk Employee
Splunk Employee

In response to your comment about unsupervised learning, there are two commands you might find useful.

kmeans

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Note that kmeans only works with numeric fields. Example:

... | kmeans k=4 disttype=cosine count

cluster (see wiki: agglomerative clustering)

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster groups events based on the contents of the _raw field. The default grouping method is to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to clustercount and clusterlabel. The clustercount value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the clusterlabel value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Note that cluster only works with textual data. It's actually what powers the patterns tab.


As @jeffland mentions there are a number of algorithms available for use in the ML Toolkit. At the time of writing, BIRCH, DBSCAN, SpectralClustering, and KMeans are all available for unsupervised tasks. Check out the docs for the ML Toolkit as well.

View solution in original post