Splunk Search

Why does using chart and stats give me different results when finding the sum of a field?

Communicator

I noticed this too. It looks like the chart command sums up the field differently???

Here’s what I’m talking about:

source=*file001* index=index001 sourcetype=delta  "duration"=*  earliest=@w latest=now    
| bucket span=1w _time AS WeekStart
| chart sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
| convert ctime(WeekStart)

source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
| bucket span=1w _time AS WeekStart
| stats sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
| convert ctime(WeekStart)

alt text

Thanks

Tags (3)
0 Karma

Communicator

I think I found what is wrong. Some duration values have commas such as 2,455 so not sure how the splunk functions treat numbers and strings. I am using a regex to fix the issue. will post here when I finish it.

0 Karma

SplunkTrust
SplunkTrust

you don't needregex, just use this

| convert num(duration)
0 Karma

SplunkTrust
SplunkTrust

I would use convert rmcomma command instead of rex. Like this

source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
 | bucket span=1w _time AS WeekStart | convert rmcomma(duration)
 | stats sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
 | convert ctime(WeekStart)
0 Karma

Communicator

I wanna add rmcomma as the answer for this thread. How do I do that? I want to give points to everyone who helped me.

0 Karma

SplunkTrust
SplunkTrust

I think your comments was the answer for your question. You can just vote up/like all the comments that helped you to resolving the issue.

0 Karma

SplunkTrust
SplunkTrust

What result do you get when you run this

source=*file001* index=index001 sourcetype=delta  "duration"=*  earliest=@w latest=now    
 | bucket span=1w _time AS WeekStart
 | chart sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
 | convert ctime(WeekStart)
 | append [search source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
 | bucket span=1w _time AS WeekStart
 | stats sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
 | convert ctime(WeekStart) ]
0 Karma

SplunkTrust
SplunkTrust

First, when investigating things like this, put in an exact timeframe so that the moment you happen to run the queries cannot change the results. I'd bet the difference of 1 in the Count figure is due to the difference between start time of the jobs.

earliest="01/18/2017:02:00:00" latest="01/18/2017:02:02:00"

Second, it's looking like when you limit the query to the exact time, you'll end up with the chart results being exactly twice the stats results for the item Sum. I'll poke around and see if I can find an explanation.

In the meantime, please add a term to each query to get the average of the "duration" field. Also, please check whether there are any records with null values or multiple values in the duration field.

| checkmv = mvcount("duration")
| search checkmv !=1
0 Karma

Communicator

nope it didnt return me any events with multi value in the 'duration' field. Yeah it is weird, there is a difference of 1 when counting 2253 vs 2254. But with CHART the sum up value doubles.

0 Karma

SplunkTrust
SplunkTrust

That makes sense (no mv on duration field) otherwise the count values would've been doubled as well.

0 Karma

SplunkTrust
SplunkTrust

If you used the mvcount!=1, then it also means no null records also (they would be zero).

Try removing the quotes around "duration" - here, run these

 source=*file001* index=index001 sourcetype=delta  "duration"=*  earliest=@w latest=now    
 | eval myDuration1 = duration
 | eval myDuration2 = "duration"
 | bucket span=1w _time AS WeekStart
 | chart 
    sum(myDuration1) AS Sum1, count(myDuration1) as Count1, 
    sum(myDuration2) AS Sum2, count(myDuration2) as Count2, 
    sum("duration") AS Sum3, count("duration") as Count3, 
    sum(duration) AS Sum4, count(duration) as Count4, 
    count as EventCount BY WeekStart
 | convert ctime(WeekStart) 

 source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
 | eval myDuration1 = duration
 | eval myDuration2 = "duration"
 | bucket span=1w _time AS WeekStart
 | stats 
    sum(myDuration1) AS Sum1, count(myDuration1) as Count1, 
    sum(myDuration2) AS Sum2, count(myDuration2) as Count2, 
    sum("duration") AS Sum3, count("duration") as Count3, 
    sum(duration) AS Sum4, count(duration) as Count4, 
    count as EventCount BY WeekStart
 | convert ctime(WeekStart)
0 Karma

SplunkTrust
SplunkTrust

with any luck, 1 and 4 will be correct and 2 and 3 will match what you've gotten before.

0 Karma

Communicator

I fixed the issue by removing the comma from the 'duration' field:
| rex mode=sed field="Ads viewed upto 25_ of total ad duration" "s/,//g"

Now both queries match. It looks like stats will treat a number like 3,455 as a string and not the number 3455. Once I removed the comma it was summing up correctly.

does that make sense? sorry I am new to Splunk.

0 Karma

SplunkTrust
SplunkTrust

Yes that's the way to go. (validated this behavior difference between stats and chart using following runwhere sample

| gentimes start=-1 | eval duration="2,345" | stats sum(duration)   ----no result
|gentimes start=-1 | eval duration="2,345" | chart sum(duration)  --- result is 2345 

)

0 Karma