Solved: Re: Sum a series of values based on the distinctne...

kamgineer · ‎02-24-2015

i think its easier to ask my question by showing you some of the data I'm working with:

Mon Feb 23 16:35:07 2015        jobid=22614.10          memory=14000    state=RUN
Mon Feb 23 16:35:07 2015        jobid=22614.11          memory=14000    state=RUN
Mon Feb 23 16:35:07 2015        jobid=22614.12          memory=14000    state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.26          memory=6000     state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.27          memory=6000     state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.29          memory=6000     state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.30          memory=6000     state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.10          memory=14000    state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.11          memory=14000    state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.12          memory=14000    state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.13          memory=14000    state=RUN
....
Tue Feb 24 10:15:34 2015        jobid=22606.431         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.432         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.433         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.434         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.435         memory=6000     state=HELD

This data is dumped every 5 min, and it may or may not change change within 5 min. The important thing to remember that there is a unique jobid identifier.

I would like to see a sum of memory used per hour per state, but i have to take into account the jobid so as to not sum up the multiple memory fields for the same job. So using the above data, jobid=22614.10 with memory=14000 should only be counted once towards the total memory used in that hour.

Sample expected results:

4:00 - RUN=56000 IDLE=20000 HELD=12000
5:00 - RUN=62000 IDLE=14000 HELD=12000
6:00 - RUN=66000 IDLE=10000 HELD=12000

does that make sense?

sanjay_shrestha · ‎02-24-2015

You can try like this:

... | bucket _time span=1h | stats max(memory) by _time jobid state

View solution in original post

sanjay_shrestha · ‎02-24-2015

You can try like this:

... | bucket _time span=1h | stats max(memory) by _time jobid state

kamgineer · ‎02-24-2015

btw, this can be done in bash with the following:

egrep "Tue Feb 24 11:.*:.* 2015" /proj/logs/apps/condor/stats.log |uniq|grep RUN|awk '{print $7}'|sed "s/memory=//"| awk '{s+=$1} END {print s}'

sanjay_shrestha · ‎02-24-2015

Can you post the result of

... | bucket _time span=1h | stats max(memory) by _time jobid state

Or you can try:

sourcetype=condor | bucket _time span=1h | stats max(memory) AS dist_memory by _time jobid state |stats sum(dist_memory) by _time state

kamgineer · ‎02-25-2015

Thanks!

The second query works as expected:

_time   state       sum(dist_memory)
2015-02-24 10:00:00 HELD    690000
2015-02-24 10:00:00 RUN     442000
2015-02-24 11:00:00 HELD    636000
2015-02-24 11:00:00 IDLE    6000
2015-02-24 11:00:00 RUN     1644000

Can you explain to me how the max(memory) makes this work? Is it taking the max of memory for each jobid within that 1hr bucket? (and even though they are the same, it just results in 1 value per hour)?

thanks

kamgineer · ‎02-24-2015

That doesn't seem to work for me.

Maybe i phrase my question differently:
I would need to dedup based on jobid for that 1 hour bucket, i tried this, but it also gave me the wrong results:

sourcetype=condor | bucket _time span=1h | dedup jobid | timechart span=1hr sum(memory) by state

Sum a series of values based on the distinctness of another column

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes