- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unique values of one field based on unique values of another field
Morning, Splunkers!
I've got a fun one today. I need to find the most resource efficient way (i.e., fastest way that won't have my IT guys calling me up and wanting to know why their ports are smoking) to return the unique values in one field that only have unique values in another field. For example, in the following table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.
Field A | Field B |
Value A | Value A1 |
Value A | Value A2 |
Value B | Value B1 |
Value C | Value C1 |
Value C | Value C2 |
Value C | Value C3 |
The big problem here is Field B can be any number of different values, so I can't query specifically on what those values may be.
I have a solution for this, and it works, but it doesn't work "at scale" because I'm looking through literally billions of records to pull this information.
Here's what I'm already doing:
| tstats count where index=myindex by Field A Field B
| stats values(Field B) as Field B by Field A
| where mvcount(Field B)=1
This takes a few minutes if I'm pulling, say, over 15 minutes, and I need to pull 90 days, and I really don't want to have to do it 15 minutes at a time and stitch everything together afterward. I will if I have to, but there's got to be a better way to do what I'm trying to do that won't make the system flip me the bird and call it a day.
Suggestions?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The answer can depend on data characteristics. On first look, I thought your solution was as efficient as it can get. You reduced a large dataset (billions of events) to a much smaller dataset, i.e., distinct values of "Field B" grouped by distinct values of "Field A". Can you give a comparison of scale of these two datasets? (I.e., how many rows after stats before where, compared to how many raw events.)
A bigger problem is in problem statement. There seems to be a misstatement in the description:
table my search result needs to only return Value B; Values A and C will be thrown out, because they don't have a unique value in Field B.
I believe you meant to say "because they don't correspond to a unique value in Field A," not Field B because every value in "Field B" is unique in your illustration. Is every value in "FieldB" unique in real data? Your SPL seems to imply so.
Because otherwise mvcount(Field B)=1 can pick more than those corresponding to a unique "Field A". For example, if dataset is
Field A | Field B |
Value A | Value A1 |
Value A | Value A1 |
Value B | Value B1 |
Value C | Value C1 |
Value C | Value C2 |
Value C | Value C3 |
your search will pick two data points
Field A | Field B |
Value A | Value A1 |
Value B | Value B1 |
But "Value A" is not unique in original data. (Neither is "Value A1".) Can you clarify?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Perhaps your example is not large enough, but from the subject, perhaps you could try this
| stats count as count_B by "Field A" "Field B"
| eventstats count as count_A by "Field A"
| where count_A = 1
