Splunk Search

Sum a series of values based on the distinctness of another column

kamgineer
Explorer

i think its easier to ask my question by showing you some of the data I'm working with:

Mon Feb 23 16:35:07 2015        jobid=22614.10          memory=14000    state=RUN
Mon Feb 23 16:35:07 2015        jobid=22614.11          memory=14000    state=RUN
Mon Feb 23 16:35:07 2015        jobid=22614.12          memory=14000    state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.26          memory=6000     state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.27          memory=6000     state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.29          memory=6000     state=RUN
Mon Feb 23 16:40:08 2015        jobid=22606.30          memory=6000     state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.10          memory=14000    state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.11          memory=14000    state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.12          memory=14000    state=RUN
Mon Feb 23 16:45:55 2015        jobid=22614.13          memory=14000    state=RUN
....
Tue Feb 24 10:15:34 2015        jobid=22606.431         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.432         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.433         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.434         memory=6000     state=HELD
Tue Feb 24 10:15:34 2015        jobid=22606.435         memory=6000     state=HELD

This data is dumped every 5 min, and it may or may not change change within 5 min. The important thing to remember that there is a unique jobid identifier.

I would like to see a sum of memory used per hour per state, but i have to take into account the jobid so as to not sum up the multiple memory fields for the same job. So using the above data, jobid=22614.10 with memory=14000 should only be counted once towards the total memory used in that hour.

Sample expected results:

4:00 - RUN=56000 IDLE=20000 HELD=12000
5:00 - RUN=62000 IDLE=14000 HELD=12000
6:00 - RUN=66000 IDLE=10000 HELD=12000

does that make sense?

0 Karma
1 Solution

sanjay_shrestha
Contributor

You can try like this:

... | bucket _time span=1h | stats max(memory) by _time jobid state 

View solution in original post

sanjay_shrestha
Contributor

You can try like this:

... | bucket _time span=1h | stats max(memory) by _time jobid state 

kamgineer
Explorer

btw, this can be done in bash with the following:

egrep "Tue Feb 24 11:.*:.* 2015" /proj/logs/apps/condor/stats.log |uniq|grep RUN|awk '{print $7}'|sed "s/memory=//"| awk '{s+=$1} END {print s}'
0 Karma

sanjay_shrestha
Contributor

Can you post the result of

... | bucket _time span=1h | stats max(memory) by _time jobid state 

Or you can try:

sourcetype=condor | bucket _time span=1h | stats max(memory) AS dist_memory by _time jobid state |stats sum(dist_memory) by _time state

kamgineer
Explorer

Thanks!

The second query works as expected:

_time   state       sum(dist_memory)
2015-02-24 10:00:00 HELD    690000
2015-02-24 10:00:00 RUN     442000
2015-02-24 11:00:00 HELD    636000
2015-02-24 11:00:00 IDLE    6000
2015-02-24 11:00:00 RUN     1644000

Can you explain to me how the max(memory) makes this work? Is it taking the max of memory for each jobid within that 1hr bucket? (and even though they are the same, it just results in 1 value per hour)?

thanks

0 Karma

kamgineer
Explorer

That doesn't seem to work for me.

Maybe i phrase my question differently:
I would need to dedup based on jobid for that 1 hour bucket, i tried this, but it also gave me the wrong results:

sourcetype=condor | bucket _time span=1h | dedup jobid | timechart span=1hr sum(memory) by state
0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...