Splunk Search

Totalize a Rate Over Time

ErikaE
Communicator

I have data coming in from a sensor that comes in the format unit/unit time, where I have a field value pair for the rate recorded and several field value pairs describing the time of the event. The rate is not recorded at a fixed interval in time.

If I want to use this rate to estimate total units over a specific time period, how can I accomplish that? Put another way, I want to be able to sum under a rate curve. I plotted a time series plot like so:

sensor | timechart span=5m avg(Value)

Thanks!

1 Solution

acharlieh
Influencer

If I'm understanding what you want... Trapezoidal integration will probably be easiest and here's a first pass at a solution that I came up with. I'll note that I'm assuming Value is rate in seconds since that's the units of _time, you'll need a conversion if it's a rate for a different time unit.

sensor | table _time Value | reverse
| streamstats last(Value) as lastValue last(_time) as lastTime current=f window=1
| eval area=(_time - lastTime)*(Value+lastValue)/2
| streamstats sum(area) as total
| timechart span=5m max(total) as total

First line is your search, we use table to remove all of the fields other than Value and _time as we need no others (we could use fields which would be able to be distributed and would be faster, but at the cost of a more verbose search here), and we reverse the results to be oldest to newest.
On the second line, we use streamstats to gather the point immediately previous to the current point.
Using this data on the third line we can now calculate the area of each trapezoid to get the estimated number produced in between each sensor sampling.
Now that we have the areas, we can use streamstats again (4th line) to sum the areas to get the running total of how many produced during our time period and finally we use timechart to make a regular graph of rates.

View solution in original post

acharlieh
Influencer

If I'm understanding what you want... Trapezoidal integration will probably be easiest and here's a first pass at a solution that I came up with. I'll note that I'm assuming Value is rate in seconds since that's the units of _time, you'll need a conversion if it's a rate for a different time unit.

sensor | table _time Value | reverse
| streamstats last(Value) as lastValue last(_time) as lastTime current=f window=1
| eval area=(_time - lastTime)*(Value+lastValue)/2
| streamstats sum(area) as total
| timechart span=5m max(total) as total

First line is your search, we use table to remove all of the fields other than Value and _time as we need no others (we could use fields which would be able to be distributed and would be faster, but at the cost of a more verbose search here), and we reverse the results to be oldest to newest.
On the second line, we use streamstats to gather the point immediately previous to the current point.
Using this data on the third line we can now calculate the area of each trapezoid to get the estimated number produced in between each sensor sampling.
Now that we have the areas, we can use streamstats again (4th line) to sum the areas to get the running total of how many produced during our time period and finally we use timechart to make a regular graph of rates.

View solution in original post

ErikaE
Communicator

Ok, here's a follow up question:

This solution is working great for places where the data is continuous. However, if there is no data for a period of time, the total jumps discontinuously.

Here's an example:

If I had data coming in at 50 units / min for 60 min, it will totalize as 3,000 units. If I then have no data, when the total resumes it has jumped discontinuously, i.e. to 5,000. When it resumes counting it totalizes correctly in a segment of continuous data.

Does that make sense? Is there a way to compensate for that?

Apologies for all the questions, we are trialing splunk and I'm having trouble finding documentation examples pertaining to numerical/sensor data. I'm trying to replicate analyses we can already do with other tools in splunk.

0 Karma

ErikaE
Communicator

I now have a transaction search that can break my data up into segments, where all the events from one continuous segment go into one transaction.

Is there a way to apply the above code to the transactions individually, then add them all together to get the total units?

Here is my simple transaction search:

sensor Value># | transaction maxpause=2m maxevents=-1 

where # is a constant number that means my search only returns events where data is good. How can I apply the above to these transactions?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Divide by 60?

ErikaE
Communicator

I am still enough of a novice that it helps to see the best/natural method to do even a simple task like that. 😐

0 Karma

ErikaE
Communicator

Works great! Can you provide an example of conversion if the rate is not in units/seconds? For example, if value was in unit / minute instead of unit / second?

Thanks!!!

0 Karma

acharlieh
Influencer

@martin_mueller is correct. If value was units/minute then we'd just adjust the area calculation by dividing by 60 so that line would become something like:

| eval area=(_time - lastTime)/60 * (Value+lastValue)/2

We adjust the units on our duration so that it cancels the units from the average of our values.

martin_mueller
SplunkTrust
SplunkTrust

I was just about to post that :<

martin_mueller
SplunkTrust
SplunkTrust

If @bmacias84's guess isn't what you're looking for you should post some sample data along with desired results.

0 Karma

ErikaE
Communicator

Unfortunately posting sample data is not an option, but I did include some reference images for what I'm trying to do--basically numerical integration.

Thanks!

0 Karma

bmacias84
Champion

What i think you want is the per_minute(x), per_hour(x), per_day(), or per_second(x) available in the timechart command.

ErikaE
Communicator

Here's my attempt:

sensor | timechart span=5m per_minute(Value) | eval rectsum=(per_minute(Value)/c)

It's still not getting at what I'm trying to do, which is basically (for a start) rectangular integration. Trapezoidal integration would be better, but I figured that I would start with rectangular and build from there.

It looks like the per_* commands are incompatible with eval, which is what I think I'd need next to do the calculation.

https://en.wikipedia.org/wiki/File:Integration_rectangle.svg
https://en.wikipedia.org/wiki/File:Integration_trapezoid.svg

bmacias84
Champion

What is c? per_minute is a only available as a timechart function, but your eval is implemented incorrectly. per_minute(value) should be treated as a column/field name not a function. You either have to use the quotes or the as command.

sensor | timechart span=5m per_minute(Value) | eval rectsum=("per_minute(Value)"/c

OR

sensor | timechart span=5m per_minute(Value) as per_min | eval rectsum=(per_min/c)
0 Karma

ErikaE
Communicator

Yep, it was implemented incorrectly. I am just getting started with splunk and having a hard time with the learning curve. My background is process-oriented.

c is a constant that corrects the rate to the correct units/time.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!