Splunk Search

2 searches with dedup returning different results when using additional explicit field

arielofri
Engager

Hi, I'm running the following searches and getting different results for the same time range (All time) when comparing projects.

For example:

For this search, I'm getting many projects and their total unique "Defect ID"s. For the ACTIVATION project, I'm getting 23 results:

index="my_index" source="my_csv.csv"
| dedup "Defect ID"
| stats count  by "Project Name"

For this search, I'm getting 36 results.:

index="my_index" source="my_csv.csv" "Project Name"=ACTIVATION 
| dedup "Defect ID"
| stats count  by "Project Name"

Why when I'm adding the "Project Name"=ACTIVATION to the search I'm getting MORE results?

When adding the | search "Project Name"=ACTIVATION somewhere after the dedup command I'm still getting 23

index="my_index" source="my_csv.csv" "Project Name"=ACTIVATION 
| dedup "Defect ID" 
| search "Project Name"=ACTIVATION
| stats count  by "Project Name"
0 Karma
1 Solution

manjunathmeti
Champion

Instead of dedup try dc and check:

 index="my_index" source="my_csv.csv"
| stats dc("Defect ID") by "Project Name"

View solution in original post

manjunathmeti
Champion

Instead of dedup try dc and check:

 index="my_index" source="my_csv.csv"
| stats dc("Defect ID") by "Project Name"

manjunathmeti
Champion

Reason: As @to4kawa commented above in your data another "Project Name" value has same "Defect ID". And dedup removes duplicate values for field "Defect ID" irrespective of "Project Name". Distinct count or dc returns the count of distinct values of the field "Defect ID" by "Project Name". Even below query also works for you. But you should avoid using dedup whenever possible.

index="my_index" source="my_csv.csv" | dedup "Defect ID", "Project Name" | stats count  by "Project Name"
0 Karma

arielofri
Engager

It's working! do you know why? What the difference between the dc() and the dedup?

0 Karma

to4kawa
Ultra Champion

I see. so, another "Project Name" has same "Defect ID" .

0 Karma

to4kawa
Ultra Champion

Because there is the event with "Project Name"=ACTIVATION has no "Defect ID" .

0 Karma

arielofri
Engager

All events with "Project Name"=ACTIVATION have a "Defect ID"

0 Karma

manjunathmeti
Champion

Which count is correct 23 OR 36? can you post some sample data of file my_csv.csv?

0 Karma

arielofri
Engager

The 36 is the correct one. Unfortunately, I can't share the data.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...