Splunk Search

dedup, distinct, similar

jgauthier
Contributor

All,

I am trying to remove duplicate values in a list of email addresses. First, I am loading this from a CSV, inside that CSV is a semi-colon delimited list of email recipients. In that list, some of the email recipients are duplicated. Obviously, they only received the email one time, and want that metric.

So, I've put together a query to start that looks like this:

sourcetype="email" sender="jgauthier*" subject="Specific email" | eval recipientlist=split(recipient, ";")

So, my recipientlist now contains the duplicate email addresses. (from recipient, but split up)

For one email, it appears that it's gone to the same person twice (since they are in the list twice).

I've played with dedup, but it doesn't seem to work in this regard.

Thanks for the help.

Tags (1)
0 Karma
1 Solution

hazekamp
Builder

This is likely caused by the way dedup is behaving with the MV field created by eval-split. Try:

sourcetype="email" sender="jgauthier*" subject="Specific email" | eval recipientlist=split(recipient, ";") | stats count by recipientlist | fields - count

If you want the "count" just remove "| fields - count" above...

View solution in original post

0 Karma

hazekamp
Builder

This is likely caused by the way dedup is behaving with the MV field created by eval-split. Try:

sourcetype="email" sender="jgauthier*" subject="Specific email" | eval recipientlist=split(recipient, ";") | stats count by recipientlist | fields - count

If you want the "count" just remove "| fields - count" above...

0 Karma

hazekamp
Builder

I am not sure why stats is not satisfying the use case of "trying to remove duplicate values...". I tested this locally and if you have:

Event 1:
recipient=a;b;c;c

Event 2:
recipient=x;y;z;a

This search:
index=_internal | head 1 | eval recipient="a;b;c;c" | append [search index=_internal | head 1 | eval recipient="x;y;z;a"] | eval recipientList=split(recipient, ";") | stats count by recipientList

Produces:

a 2
b 1
c 2
x 1
y 1
z 1

Which is a consolidate list of email addresses.

0 Karma

jgauthier
Contributor

I saw Sorkin's thread before I posted and worked with it.
But neither that effort or the one above seemed to accomplish this. Simply doing "stats count by recipientlist" gives me a count of when the string exists multiple times in the recipientlist. "| fields - count" just gives me the field. Even if I have 100 distinct events, with a field containing duplicate values.
Thanks!

0 Karma

hazekamp
Builder
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...