Splunk Search

Pulling out duplicates from multivalue fields and counting

steve_b_88
Engager

I'm trying to achieve the following and hoped someone could help?

I have a multivalue field that contains values that are colors, and would like to know how many fields contain duplicate colors, and what the value of those colors are.

e.g. my data

colors
blue
blue
red
yellow
red
blue
red
blue
red
red
green
green



Would return something like:

duplicate_colorduplicate_count
blue2
red1
green1


Because 'blue' is present as a duplicate in two entries, 'red' in one entry, and 'green' in one entry. 'yellow' is omitted because it is not a duplicate.

Thank you very much for any help
Steve

Labels (4)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

The solution becomes more obvious if I restate the problem like this: In addition to colors, you must have another field with four distinct values.  Let's call the additional field group, and give them values "a", "b", "c", and "d".

colorsgroup
bluea
bluea
reda
yellowb
redb
bluec
redc
bluec
redd
redd
greend
greend

When data structure is clear, what you are asking is to

  1. Find values of colors that appear more than once with each group value.
  2. Count how many distinct values of group for each of duplicated values of colors.

Hence,

 

| stats count by colors group
| where count > 1
| stats dc(group) as duplicate_count by colors

 

Here is a data emulation you can play with and compare with real data

 

| makeresults format=csv data="colors,group
blue,a
blue,a
red,a
yellow,b
red,b
blue,c
red,c
blue,c
red,d
red,d
green,d
green,d"
``` data emulation above ```

 

String the two together, you get

colorsduplicate_count
blue2
green1
red1

View solution in original post

steve_b_88
Engager

@yuanliu Thank you for your answer. This rephrasing of the problem is great and this solution helps solve my issue using an additional group (i have a `uuid` field for each row that I can use). Many many thanks.

@bowesmana Thank you. I do have a `uuid` field for each row, that I did not have in the question, and have gone ahead and used that.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you do not have a 'group' value per event, then you can simply create one by putting

| streamstats c as group

before the first stats mentioned by @yuanliu 

yuanliu
SplunkTrust
SplunkTrust

The solution becomes more obvious if I restate the problem like this: In addition to colors, you must have another field with four distinct values.  Let's call the additional field group, and give them values "a", "b", "c", and "d".

colorsgroup
bluea
bluea
reda
yellowb
redb
bluec
redc
bluec
redd
redd
greend
greend

When data structure is clear, what you are asking is to

  1. Find values of colors that appear more than once with each group value.
  2. Count how many distinct values of group for each of duplicated values of colors.

Hence,

 

| stats count by colors group
| where count > 1
| stats dc(group) as duplicate_count by colors

 

Here is a data emulation you can play with and compare with real data

 

| makeresults format=csv data="colors,group
blue,a
blue,a
red,a
yellow,b
red,b
blue,c
red,c
blue,c
red,d
red,d
green,d
green,d"
``` data emulation above ```

 

String the two together, you get

colorsduplicate_count
blue2
green1
red1
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...