Deployment Architecture

Using cluster command and showing the unique contents of each cluster - How?

Builder

I have a field in my events that can vary ever so subtly named "Serial". I am using the cluster command to combine these similar values into groups/clusters. This part works.

However, I cannot figure out how to list out the unqiue values of making up each cluster after combining them. This is the whole point I'm trying to achieve... I need to know which values are closely related. The results only display ONE value for each cluster in the table. There are over 6,000 unique values that cluster down into 30~ clusters after running the command, and I need the list of 6,000 chopped up by cluster.


Example data set that will return two clusters:

  • Serial=123456789
  • Serial=123456788
  • Serial=123456787
  • Serial=987654321
  • Serial=987654322
  • Serial=987654323

The basic working query:

  • index=stuff | cluster t=0.35 field=Serial | table cluster_count, cluster_label, Serial | sort - cluster_count

Data returned from the query:

  • 30 1 123456789
  • 23 2 987654321

My questions is: How do I list out the values for each cluster instead of just one? Below is what I expected to work but it returns the same as above. One "Serial" value per count_label value. I thought it would return all of the values in each cluster_label:

  • index=stuff Serial="*" | cluster t=0.35 field=Serial | stats values(Serial) by cluster_label

Help!

Tags (3)
0 Karma
1 Solution

Path Finder

If I understand you correctly, what you are looking for is the 'labelonly=true' option. This will return to you all of your events, but still grouped into your clusters.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true

So with your example you will get this:

30 1 123456789
30 1 123456788
30 1 123456787
23 2 987654321
23 2 987654322
23 2 987654323

You can then see only the events from a specific cluster by searching on the cluster_label.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true | search cluster_label=2

will return this:


23 2 987654321
23 2 987654322
23 2 987654323

View solution in original post

Path Finder

If I understand you correctly, what you are looking for is the 'labelonly=true' option. This will return to you all of your events, but still grouped into your clusters.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true

So with your example you will get this:

30 1 123456789
30 1 123456788
30 1 123456787
23 2 987654321
23 2 987654322
23 2 987654323

You can then see only the events from a specific cluster by searching on the cluster_label.

index=stuff Serial="*" | cluster t=0.35 field=Serial labelonly=true | search cluster_label=2

will return this:


23 2 987654321
23 2 987654322
23 2 987654323

View solution in original post

Path Finder

Glad to help!

0 Karma

Is there any way to view the unique contents of all the clusters in one view? The above command displays the results only for one cluster label.

0 Karma

Builder

This is exactly what I was looking for. The definition of labelonly did not make this obvious until I read it over a few times. Not sure why this isn't the default option.

Thank you. Now I know what data I'm looking at.

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes and swag!