Splunk Search

Why did my results change when I replaced dedup with stats dc in my search?

IRHM73
Motivator

I wonder whether someone could help me please.

I initially used the search below with my results for a given day shown as:

Opt In = 1696
Out Out = 858

auditSource=preferences auditType=TxSucceeded detail.input-preference-digital=* 
| sort 0 detail.input-utr,-_time 
| dedup detail.input-utr | replace "true" with "Opted In" in detail.input-preference-digital 
| replace "false" with "Opted Out" in detail.input-preference-digital
| eval inOrOut='detail.input-preference-digital'
| chart count by inOrOut
| eval pieSlice=inOrOut  + " " + count 
| fields pieSlice, count

I then removed the dedup and changed it to a stats dc as below, but the figures come out as:

Opted In = 1695
Opted Out = 859

auditSource=preferences auditType=TxSucceeded detail.input-preference-digital=*  
|  replace "true" with "Opted In", "false" with "Opted Out" in detail.input-preference-digital 
|eval inOrOut='detail.input-preference-digital' 
| stats dc(detail.input-utr) first(inOrOut) As inOrOut By detail.input-utr  
| chart count by inOrOut 
| eval pieSlice=inOrOut  + " " + count  
|fields pieSlice, count

As you can see the figures have changed, but I'm not sure why.

I just wondered whether someone could look at this and let me know where I've gone wrong.

Many thanks and kind regards

Chris

0 Karma
1 Solution

tom_frotscher
Builder

Hi,

you sorted by some fields in the first query and then you used dedup. If dedup finds multiple events with the same value, it simply takes the first event and dismisses the rest of the events. So your sort influences the results.

In the second search you do the same, since you do first(inOrOut) in the stats. Because you did not sort like in the first query, the first might now be something else than in the first query.

So try to do the | sort 0 detail.input-utr,-_time also in the second query and compare the results again.

Greetings

Tom

View solution in original post

tom_frotscher
Builder

Hi,

you sorted by some fields in the first query and then you used dedup. If dedup finds multiple events with the same value, it simply takes the first event and dismisses the rest of the events. So your sort influences the results.

In the second search you do the same, since you do first(inOrOut) in the stats. Because you did not sort like in the first query, the first might now be something else than in the first query.

So try to do the | sort 0 detail.input-utr,-_time also in the second query and compare the results again.

Greetings

Tom

IRHM73
Motivator

Hi Tom, I'm sorry to trouble you but I wonder whether you could help me please.

You kindly provided me with the reason why there was a disparity between my two queries, and certainly the inclusion of the 'sort' solved the problem.

But I inherited this query initially, so I'm a little unsure what the 'sort 0 detail.input-utr,-_time' actually does.

Many thanks and kind regards

Chris

0 Karma

IRHM73
Motivator

Hi Tom, that got it! what a difference a fresh pair of eyes make.

Many thanks and kind regards

Chris

0 Karma

tom_frotscher
Builder

Hey,

glad to help you! Always good to have a second opinion, whenever you stuck with a problem.
I converted my comment to an answer. Feel free to mark your question as answered.

Greetings

Tom

Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...