Re: Unique values of one field based on unique val...

A_VA · ‎07-16-2024

Morning, Splunkers!

I've got a fun one today. I need to find the most resource efficient way (i.e., fastest way that won't have my IT guys calling me up and wanting to know why their ports are smoking) to return the unique values in one field that only have unique values in another field. For example, in the following table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.

Field A	Field B
Value A	Value A1
Value A	Value A2
Value B	Value B1
Value C	Value C1
Value C	Value C2
Value C	Value C3

The big problem here is Field B can be any number of different values, so I can't query specifically on what those values may be.

I have a solution for this, and it works, but it doesn't work "at scale" because I'm looking through literally billions of records to pull this information.

Here's what I'm already doing:

| tstats count where index=myindex by Field A Field B
| stats values(Field B) as Field B by Field A
| where mvcount(Field B)=1

This takes a few minutes if I'm pulling, say, over 15 minutes, and I need to pull 90 days, and I really don't want to have to do it 15 minutes at a time and stitch everything together afterward. I will if I have to, but there's got to be a better way to do what I'm trying to do that won't make the system flip me the bird and call it a day.

Suggestions?

yuanliu · ‎07-16-2024

The answer can depend on data characteristics. On first look, I thought your solution was as efficient as it can get. You reduced a large dataset (billions of events) to a much smaller dataset, i.e., distinct values of "Field B" grouped by distinct values of "Field A". Can you give a comparison of scale of these two datasets? (I.e., how many rows after stats before where, compared to how many raw events.)

A bigger problem is in problem statement. There seems to be a misstatement in the description:

table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.

I believe you meant to say "because they don't correspond to a unique value in Field A," not Field B because every value in "Field B" is unique in your illustration. Is every value in "FieldB" unique in real data? Your SPL seems to imply so.

Because otherwise mvcount(Field B)=1 can pick more than those corresponding to a unique "Field A". For example, if dataset is

Field A	Field B
Value A	Value A1
Value A	Value A1
Value B	Value B1
Value C	Value C1
Value C	Value C2
Value C	Value C3

your search will pick two data points

Field A	Field B
Value A	Value A1
Value B	Value B1

But "Value A" is not unique in original data. (Neither is "Value A1".) Can you clarify?

ITWhisperer · ‎07-16-2024

Perhaps your example is not large enough, but from the subject, perhaps you could try this

| stats count as count_B by "Field A" "Field B"
| eventstats count as count_A by "Field A"
| where count_A = 1

Unique values of one field based on unique values of another field

field extraction

fields

tstats

Dashboards: Hiding charts while search is being executed and other uses for tokens

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

Are you a member of the Splunk Community?