Deployment Architecture

Using cluster command and showing the unique contents of each cluster - How?

thisissplunk
Builder

I have a field in my events that can vary ever so subtly named "Serial". I am using the cluster command to combine these similar values into groups/clusters. This part works.

However, I cannot figure out how to list out the unqiue values of making up each cluster after combining them. This is the whole point I'm trying to achieve... I need to know which values are closely related. The results only display ONE value for each cluster in the table. There are over 6,000 unique values that cluster down into 30~ clusters after running the command, and I need the list of 6,000 chopped up by cluster.


Example data set that will return two clusters:

  • Serial=123456789
  • Serial=123456788
  • Serial=123456787
  • Serial=987654321
  • Serial=987654322
  • Serial=987654323

The basic working query:

  • index=stuff | cluster t=0.35 field=Serial | table cluster_count, cluster_label, Serial | sort - cluster_count

Data returned from the query:

  • 30 1 123456789
  • 23 2 987654321

My questions is: How do I list out the values for each cluster instead of just one? Below is what I expected to work but it returns the same as above. One "Serial" value per count_label value. I thought it would return all of the values in each cluster_label:

  • index=stuff Serial="*" | cluster t=0.35 field=Serial | stats values(Serial) by cluster_label

Help!

Tags (3)
0 Karma
1 Solution

emccaslin
Path Finder

If I understand you correctly, what you are looking for is the 'labelonly=true' option. This will return to you all of your events, but still grouped into your clusters.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true

So with your example you will get this:

30 1 123456789
30 1 123456788
30 1 123456787
23 2 987654321
23 2 987654322
23 2 987654323

You can then see only the events from a specific cluster by searching on the cluster_label.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true | search cluster_label=2

will return this:


23 2 987654321
23 2 987654322
23 2 987654323

View solution in original post

emccaslin
Path Finder

If I understand you correctly, what you are looking for is the 'labelonly=true' option. This will return to you all of your events, but still grouped into your clusters.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true

So with your example you will get this:

30 1 123456789
30 1 123456788
30 1 123456787
23 2 987654321
23 2 987654322
23 2 987654323

You can then see only the events from a specific cluster by searching on the cluster_label.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true | search cluster_label=2

will return this:


23 2 987654321
23 2 987654322
23 2 987654323

emccaslin
Path Finder

Glad to help!

0 Karma

KrithikaRamakri
Explorer

Is there any way to view the unique contents of all the clusters in one view? The above command displays the results only for one cluster label.

0 Karma

thisissplunk
Builder

This is exactly what I was looking for. The definition of labelonly did not make this obvious until I read it over a few times. Not sure why this isn't the default option.

Thank you. Now I know what data I'm looking at.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...