Splunk Search

Why does using chart and stats give me different results when finding the sum of a field?

maximusdm
Communicator

I noticed this too. It looks like the chart command sums up the field differently???

Here’s what I’m talking about:

source=*file001* index=index001 sourcetype=delta  "duration"=*  earliest=@w latest=now    
| bucket span=1w _time AS WeekStart
| chart sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
| convert ctime(WeekStart)

source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
| bucket span=1w _time AS WeekStart
| stats sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
| convert ctime(WeekStart)

alt text

Thanks

Tags (3)
0 Karma

maximusdm
Communicator

I think I found what is wrong. Some duration values have commas such as 2,455 so not sure how the splunk functions treat numbers and strings. I am using a regex to fix the issue. will post here when I finish it.

0 Karma

DalJeanis
Legend

you don't needregex, just use this

| convert num(duration)
0 Karma

somesoni2
Revered Legend

I would use convert rmcomma command instead of rex. Like this

source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
 | bucket span=1w _time AS WeekStart | convert rmcomma(duration)
 | stats sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
 | convert ctime(WeekStart)
0 Karma

maximusdm
Communicator

I wanna add rmcomma as the answer for this thread. How do I do that? I want to give points to everyone who helped me.

0 Karma

somesoni2
Revered Legend

I think your comments was the answer for your question. You can just vote up/like all the comments that helped you to resolving the issue.

0 Karma

somesoni2
Revered Legend

What result do you get when you run this

source=*file001* index=index001 sourcetype=delta  "duration"=*  earliest=@w latest=now    
 | bucket span=1w _time AS WeekStart
 | chart sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
 | convert ctime(WeekStart)
 | append [search source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
 | bucket span=1w _time AS WeekStart
 | stats sum("duration") AS Sum, count("duration") as Count, count BY WeekStart
 | convert ctime(WeekStart) ]
0 Karma

DalJeanis
Legend

First, when investigating things like this, put in an exact timeframe so that the moment you happen to run the queries cannot change the results. I'd bet the difference of 1 in the Count figure is due to the difference between start time of the jobs.

earliest="01/18/2017:02:00:00" latest="01/18/2017:02:02:00"

Second, it's looking like when you limit the query to the exact time, you'll end up with the chart results being exactly twice the stats results for the item Sum. I'll poke around and see if I can find an explanation.

In the meantime, please add a term to each query to get the average of the "duration" field. Also, please check whether there are any records with null values or multiple values in the duration field.

| checkmv = mvcount("duration")
| search checkmv !=1
0 Karma

maximusdm
Communicator

nope it didnt return me any events with multi value in the 'duration' field. Yeah it is weird, there is a difference of 1 when counting 2253 vs 2254. But with CHART the sum up value doubles.

0 Karma

somesoni2
Revered Legend

That makes sense (no mv on duration field) otherwise the count values would've been doubled as well.

0 Karma

DalJeanis
Legend

If you used the mvcount!=1, then it also means no null records also (they would be zero).

Try removing the quotes around "duration" - here, run these

 source=*file001* index=index001 sourcetype=delta  "duration"=*  earliest=@w latest=now    
 | eval myDuration1 = duration
 | eval myDuration2 = "duration"
 | bucket span=1w _time AS WeekStart
 | chart 
    sum(myDuration1) AS Sum1, count(myDuration1) as Count1, 
    sum(myDuration2) AS Sum2, count(myDuration2) as Count2, 
    sum("duration") AS Sum3, count("duration") as Count3, 
    sum(duration) AS Sum4, count(duration) as Count4, 
    count as EventCount BY WeekStart
 | convert ctime(WeekStart) 

 source=*file001* index=index001 sourcetype=delta "duration"=*  earliest=@w latest=now    
 | eval myDuration1 = duration
 | eval myDuration2 = "duration"
 | bucket span=1w _time AS WeekStart
 | stats 
    sum(myDuration1) AS Sum1, count(myDuration1) as Count1, 
    sum(myDuration2) AS Sum2, count(myDuration2) as Count2, 
    sum("duration") AS Sum3, count("duration") as Count3, 
    sum(duration) AS Sum4, count(duration) as Count4, 
    count as EventCount BY WeekStart
 | convert ctime(WeekStart)
0 Karma

DalJeanis
Legend

with any luck, 1 and 4 will be correct and 2 and 3 will match what you've gotten before.

0 Karma

maximusdm
Communicator

I fixed the issue by removing the comma from the 'duration' field:
| rex mode=sed field="Ads viewed upto 25_ of total ad duration" "s/,//g"

Now both queries match. It looks like stats will treat a number like 3,455 as a string and not the number 3455. Once I removed the comma it was summing up correctly.

does that make sense? sorry I am new to Splunk.

0 Karma

somesoni2
Revered Legend

Yes that's the way to go. (validated this behavior difference between stats and chart using following runwhere sample

| gentimes start=-1 | eval duration="2,345" | stats sum(duration)   ----no result
|gentimes start=-1 | eval duration="2,345" | chart sum(duration)  --- result is 2345 

)

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...