Splunk Search

Pulling out duplicates from multivalue fields and counting

steve_b_88
Engager

I'm trying to achieve the following and hoped someone could help?

I have a multivalue field that contains values that are colors, and would like to know how many fields contain duplicate colors, and what the value of those colors are.

e.g. my data

colors
blue
blue
red
yellow
red
blue
red
blue
red
red
green
green



Would return something like:

duplicate_colorduplicate_count
blue2
red1
green1


Because 'blue' is present as a duplicate in two entries, 'red' in one entry, and 'green' in one entry. 'yellow' is omitted because it is not a duplicate.

Thank you very much for any help
Steve

Labels (4)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

The solution becomes more obvious if I restate the problem like this: In addition to colors, you must have another field with four distinct values.  Let's call the additional field group, and give them values "a", "b", "c", and "d".

colorsgroup
bluea
bluea
reda
yellowb
redb
bluec
redc
bluec
redd
redd
greend
greend

When data structure is clear, what you are asking is to

  1. Find values of colors that appear more than once with each group value.
  2. Count how many distinct values of group for each of duplicated values of colors.

Hence,

 

| stats count by colors group
| where count > 1
| stats dc(group) as duplicate_count by colors

 

Here is a data emulation you can play with and compare with real data

 

| makeresults format=csv data="colors,group
blue,a
blue,a
red,a
yellow,b
red,b
blue,c
red,c
blue,c
red,d
red,d
green,d
green,d"
``` data emulation above ```

 

String the two together, you get

colorsduplicate_count
blue2
green1
red1

View solution in original post

steve_b_88
Engager

@yuanliu Thank you for your answer. This rephrasing of the problem is great and this solution helps solve my issue using an additional group (i have a `uuid` field for each row that I can use). Many many thanks.

@bowesmana Thank you. I do have a `uuid` field for each row, that I did not have in the question, and have gone ahead and used that.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If you do not have a 'group' value per event, then you can simply create one by putting

| streamstats c as group

before the first stats mentioned by @yuanliu 

yuanliu
SplunkTrust
SplunkTrust

The solution becomes more obvious if I restate the problem like this: In addition to colors, you must have another field with four distinct values.  Let's call the additional field group, and give them values "a", "b", "c", and "d".

colorsgroup
bluea
bluea
reda
yellowb
redb
bluec
redc
bluec
redd
redd
greend
greend

When data structure is clear, what you are asking is to

  1. Find values of colors that appear more than once with each group value.
  2. Count how many distinct values of group for each of duplicated values of colors.

Hence,

 

| stats count by colors group
| where count > 1
| stats dc(group) as duplicate_count by colors

 

Here is a data emulation you can play with and compare with real data

 

| makeresults format=csv data="colors,group
blue,a
blue,a
red,a
yellow,b
red,b
blue,c
red,c
blue,c
red,d
red,d
green,d
green,d"
``` data emulation above ```

 

String the two together, you get

colorsduplicate_count
blue2
green1
red1
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...