Splunk Search

Show items which appear in clusters

viggor
Path Finder

I have a log file of the following sort:

 vendor productId clusterId
A        1         1
B        2         1
A        3         1
C        4         4
D        8         8
D        9         8
D        10       10

Now I would like to select those vendors who have a least one productId which is contained in a cluster of size at least k. The cluster size corresponds to the number of rows with the same clusterId.

So, in the example above, companies A and B both appear in a cluster of size 3, D in a cluster of size 2 and company C in a cluster of size 1.

In SQL I would solve this using sub-queries, but I am not sure how to tackle this in splunk.

0 Karma

somesoni2
Revered Legend

Try this (adjust where clause per your need)

your current search giving above output with fields vendor productId clusterId
| eventstats count as clusterSize by clusterId
| where clusterSize>k | stats count as productsCount by vendor
0 Karma

mayurr98
Super Champion

how do you determine cluster size?I mean what is the logic to determine cluster size for the input you have given?

0 Karma

viggor
Path Finder

The cluster size corresponds to the number of rows with the same clusterId.

0 Karma

mayurr98
Super Champion

I did not understand A and B both appear in a cluster of size 3, D in a cluster of size 2 and company C in a cluster of size 1.
I mean this does not match with no of rows with same cluster id . Kindly explain in detail.also put the corresponding size in a table..like at each row what will be the size

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...