Solved: How can I do prediction with the different algorit...

Jaravuy · ‎01-10-2016

How can I do prediction with the different algorithms like Clustering, Sequence Clustering, etc in Splunk?
Splunk uses Kalman filter, but i need to try with different algorithms.

Can anyone help.

aljohnson_splun · ‎01-11-2016

In response to your comment about unsupervised learning, there are two commands you might find useful.

kmeans

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Note that kmeans only works with numeric fields. Example:

... | kmeans k=4 disttype=cosine count

cluster (see wiki: agglomerative clustering)

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster groups events based on the contents of the _raw field. The default grouping method is to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to cluster_count and cluster_label. The cluster_count value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the cluster_label value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Note that cluster only works with textual data. It's actually what powers the patterns tab.

As @jeffland mentions there are a number of algorithms available for use in the ML Toolkit. At the time of writing, BIRCH, DBSCAN, SpectralClustering, and KMeans are all available for unsupervised tasks. Check out the docs for the ML Toolkit as well.

View solution in original post

aljohnson_splun · ‎01-11-2016

In response to your comment about unsupervised learning, there are two commands you might find useful.

kmeans

Partitions the events into k clusters, with each cluster defined by its mean value. Each event belongs to the cluster with the nearest mean value. Performs k-means clustering on the list of fields that you specify. If no fields are specified, performs the clustering on all numeric fields. Events in the same cluster are moved next to each other. You have the option to display the cluster number for each event.

Note that kmeans only works with numeric fields. Example:

... | kmeans k=4 disttype=cosine count

cluster (see wiki: agglomerative clustering)

The cluster command groups events together based on how similar they are to each other. Unless you specify a different field, cluster groups events based on the contents of the _raw field. The default grouping method is to break down the events into terms (match=termlist) and compute the vector between events. Set a higher threshold value for t, if you want the command to be more discriminating about which events are grouped together.

The result of the cluster command appends two new fields to each event. You can specify what to name these fields with the countfield and labelfield parameters, which default to cluster_count and cluster_label. The cluster_count value is the number of events that are part of the cluster, or the cluster size. Each event in the cluster is assigned the cluster_label value of the cluster it belongs to. For example, if the search returns 10 clusters, then the clusters are labeled from 1 to 10.

Note that cluster only works with textual data. It's actually what powers the patterns tab.

As @jeffland mentions there are a number of algorithms available for use in the ML Toolkit. At the time of writing, BIRCH, DBSCAN, SpectralClustering, and KMeans are all available for unsupervised tasks. Check out the docs for the ML Toolkit as well.

jeffland · ‎01-11-2016

You want to check out the Machine Learning App - it has some interesting cases, and it shows you what you can currently do with splunk in that regard. It may not fulfill all your needs, but I believe it's work in progress, so there's more to come (I'm looking forward to it as well).

Jaravuy · ‎01-10-2016

How to do Unsupervised Learning in Splunk.

How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

kmeans

cluster (see wiki: agglomerative clustering)

kmeans

cluster (see wiki: agglomerative clustering)

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Calling All Security Pros: Ready to Race Through Boston?

Are you a member of the Splunk Community?

How can I do prediction with the different algorithms like clustering, sequence clustering, etc in Splunk?

kmeans

cluster (see wiki: agglomerative clustering)

kmeans

cluster (see wiki: agglomerative clustering)

Leveraging Detections from the Splunk Threat Research Team & Cisco Talos

New in Splunk Observability Cloud: Automated Archiving for Unused Metrics

Calling All Security Pros: Ready to Race Through Boston?