Deployment Architecture

Using cluster command and showing the unique contents of each cluster - How?

thisissplunk
Builder

I have a field in my events that can vary ever so subtly named "Serial". I am using the cluster command to combine these similar values into groups/clusters. This part works.

However, I cannot figure out how to list out the unqiue values of making up each cluster after combining them. This is the whole point I'm trying to achieve... I need to know which values are closely related. The results only display ONE value for each cluster in the table. There are over 6,000 unique values that cluster down into 30~ clusters after running the command, and I need the list of 6,000 chopped up by cluster.


Example data set that will return two clusters:

  • Serial=123456789
  • Serial=123456788
  • Serial=123456787
  • Serial=987654321
  • Serial=987654322
  • Serial=987654323

The basic working query:

  • index=stuff | cluster t=0.35 field=Serial | table cluster_count, cluster_label, Serial | sort - cluster_count

Data returned from the query:

  • 30 1 123456789
  • 23 2 987654321

My questions is: How do I list out the values for each cluster instead of just one? Below is what I expected to work but it returns the same as above. One "Serial" value per count_label value. I thought it would return all of the values in each cluster_label:

  • index=stuff Serial="*" | cluster t=0.35 field=Serial | stats values(Serial) by cluster_label

Help!

Tags (3)
0 Karma
1 Solution

emccaslin
Path Finder

If I understand you correctly, what you are looking for is the 'labelonly=true' option. This will return to you all of your events, but still grouped into your clusters.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true

So with your example you will get this:

30 1 123456789
30 1 123456788
30 1 123456787
23 2 987654321
23 2 987654322
23 2 987654323

You can then see only the events from a specific cluster by searching on the cluster_label.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true | search cluster_label=2

will return this:


23 2 987654321
23 2 987654322
23 2 987654323

View solution in original post

emccaslin
Path Finder

If I understand you correctly, what you are looking for is the 'labelonly=true' option. This will return to you all of your events, but still grouped into your clusters.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true

So with your example you will get this:

30 1 123456789
30 1 123456788
30 1 123456787
23 2 987654321
23 2 987654322
23 2 987654323

You can then see only the events from a specific cluster by searching on the cluster_label.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true | search cluster_label=2

will return this:


23 2 987654321
23 2 987654322
23 2 987654323

emccaslin
Path Finder

Glad to help!

0 Karma

KrithikaRamakri
Explorer

Is there any way to view the unique contents of all the clusters in one view? The above command displays the results only for one cluster label.

0 Karma

thisissplunk
Builder

This is exactly what I was looking for. The definition of labelonly did not make this obvious until I read it over a few times. Not sure why this isn't the default option.

Thank you. Now I know what data I'm looking at.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...