Splunk Search

Unique values of one field based on unique values of another field

A_VA
Explorer

Morning, Splunkers!

I've got a fun one today. I need to find the most resource efficient way (i.e., fastest way that won't have my IT guys calling me up and wanting to know why their ports are smoking) to return the unique values in one field that only have unique values in another field. For example, in the following table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.

Field AField B
Value AValue A1
Value AValue A2
Value BValue B1
Value CValue C1
Value CValue C2
Value CValue C3

 

The big problem here is Field B can be any number of different values, so I can't query specifically on what those values may be.

I have a solution for this, and it works, but it doesn't work "at scale" because I'm looking through literally billions of records to pull this information. 

Here's what I'm already doing:

 

| tstats count where index=myindex by Field A Field B
| stats values(Field B) as Field B by Field A
| where mvcount(Field B)=1

 

 

This takes a few minutes if I'm pulling, say, over 15 minutes, and I need to pull 90 days, and I really don't want to have to do it 15 minutes at a time and stitch everything together afterward. I will if I have to, but there's got to be a better way to do what I'm trying to do that won't make the system flip me the bird and call it a day.

Suggestions?

Labels (3)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

The answer can depend on data characteristics.  On first look, I thought your solution was as efficient as it can get.  You reduced a large dataset (billions of events) to a much smaller dataset, i.e., distinct values of "Field B" grouped by distinct values of "Field A".  Can you give a comparison of scale of these two datasets? (I.e., how many rows after stats before where, compared to how many raw events.)

A bigger problem is in problem statement.  There seems to be a misstatement in the description:


table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.

I believe you meant to say "because they don't correspond to a unique value in Field A," not Field B because every value in "Field B" is unique in your illustration.  Is every value in "FieldB" unique in real data?  Your SPL seems to imply so.

Because otherwise mvcount(Field B)=1 can pick more than those corresponding to a unique "Field A".  For example, if dataset is

Field AField B
Value AValue A1
Value AValue A1
Value BValue B1
Value CValue C1
Value CValue C2
Value CValue C3

your search will pick two data points

Field AField B
Value AValue A1
Value BValue B1

But "Value A" is not unique in original data. (Neither is "Value A1".)  Can you clarify?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Perhaps your example is not large enough, but from the subject, perhaps you could try this

| stats count as count_B by "Field A" "Field B"
| eventstats count as count_A by "Field A"
| where count_A = 1
0 Karma
Get Updates on the Splunk Community!

.conf25 Community Recap

Hello Splunkers, And just like that, .conf25 is in the books! What an incredible few days — full of learning, ...

Splunk App Developers | .conf25 Recap & What’s Next

If you stopped by the Builder Bar at .conf25 this year, thank you! The retro tech beer garden vibes were ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...