Splunk Search

Unique values of one field based on unique values of another field

A_VA
Explorer

Morning, Splunkers!

I've got a fun one today. I need to find the most resource efficient way (i.e., fastest way that won't have my IT guys calling me up and wanting to know why their ports are smoking) to return the unique values in one field that only have unique values in another field. For example, in the following table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.

Field AField B
Value AValue A1
Value AValue A2
Value BValue B1
Value CValue C1
Value CValue C2
Value CValue C3

 

The big problem here is Field B can be any number of different values, so I can't query specifically on what those values may be.

I have a solution for this, and it works, but it doesn't work "at scale" because I'm looking through literally billions of records to pull this information. 

Here's what I'm already doing:

 

| tstats count where index=myindex by Field A Field B
| stats values(Field B) as Field B by Field A
| where mvcount(Field B)=1

 

 

This takes a few minutes if I'm pulling, say, over 15 minutes, and I need to pull 90 days, and I really don't want to have to do it 15 minutes at a time and stitch everything together afterward. I will if I have to, but there's got to be a better way to do what I'm trying to do that won't make the system flip me the bird and call it a day.

Suggestions?

Labels (3)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

The answer can depend on data characteristics.  On first look, I thought your solution was as efficient as it can get.  You reduced a large dataset (billions of events) to a much smaller dataset, i.e., distinct values of "Field B" grouped by distinct values of "Field A".  Can you give a comparison of scale of these two datasets? (I.e., how many rows after stats before where, compared to how many raw events.)

A bigger problem is in problem statement.  There seems to be a misstatement in the description:


table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.

I believe you meant to say "because they don't correspond to a unique value in Field A," not Field B because every value in "Field B" is unique in your illustration.  Is every value in "FieldB" unique in real data?  Your SPL seems to imply so.

Because otherwise mvcount(Field B)=1 can pick more than those corresponding to a unique "Field A".  For example, if dataset is

Field AField B
Value AValue A1
Value AValue A1
Value BValue B1
Value CValue C1
Value CValue C2
Value CValue C3

your search will pick two data points

Field AField B
Value AValue A1
Value BValue B1

But "Value A" is not unique in original data. (Neither is "Value A1".)  Can you clarify?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Perhaps your example is not large enough, but from the subject, perhaps you could try this

| stats count as count_B by "Field A" "Field B"
| eventstats count as count_A by "Field A"
| where count_A = 1
0 Karma
Get Updates on the Splunk Community!

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...

AppDynamics is now part of Splunk Ideas

Hello Splunkers, We have exciting news for you! AppDynamics has been added to the Splunk Ideas Portal. Which ...

Advanced Splunk Data Management Strategies

Join us on Wednesday, May 14, 2025, at 11 AM PDT / 2 PM EDT for an exclusive Tech Talk that delves into ...