I have data coming in from a sensor that comes in the format unit/unit time, where I have a field value pair for the rate recorded and several field value pairs describing the time of the event. The rate is not recorded at a fixed interval in time.
If I want to use this rate to estimate total units over a specific time period, how can I accomplish that? Put another way, I want to be able to sum under a rate curve. I plotted a time series plot like so:
sensor | timechart span=5m avg(Value)
Thanks!
If I'm understanding what you want... Trapezoidal integration will probably be easiest and here's a first pass at a solution that I came up with. I'll note that I'm assuming Value is rate in seconds since that's the units of _time, you'll need a conversion if it's a rate for a different time unit.
sensor | table _time Value | reverse
| streamstats last(Value) as lastValue last(_time) as lastTime current=f window=1
| eval area=(_time - lastTime)*(Value+lastValue)/2
| streamstats sum(area) as total
| timechart span=5m max(total) as total
First line is your search, we use table to remove all of the fields other than Value and _time as we need no others (we could use fields which would be able to be distributed and would be faster, but at the cost of a more verbose search here), and we reverse the results to be oldest to newest.
On the second line, we use streamstats to gather the point immediately previous to the current point.
Using this data on the third line we can now calculate the area of each trapezoid to get the estimated number produced in between each sensor sampling.
Now that we have the areas, we can use streamstats again (4th line) to sum the areas to get the running total of how many produced during our time period and finally we use timechart to make a regular graph of rates.
If I'm understanding what you want... Trapezoidal integration will probably be easiest and here's a first pass at a solution that I came up with. I'll note that I'm assuming Value is rate in seconds since that's the units of _time, you'll need a conversion if it's a rate for a different time unit.
sensor | table _time Value | reverse
| streamstats last(Value) as lastValue last(_time) as lastTime current=f window=1
| eval area=(_time - lastTime)*(Value+lastValue)/2
| streamstats sum(area) as total
| timechart span=5m max(total) as total
First line is your search, we use table to remove all of the fields other than Value and _time as we need no others (we could use fields which would be able to be distributed and would be faster, but at the cost of a more verbose search here), and we reverse the results to be oldest to newest.
On the second line, we use streamstats to gather the point immediately previous to the current point.
Using this data on the third line we can now calculate the area of each trapezoid to get the estimated number produced in between each sensor sampling.
Now that we have the areas, we can use streamstats again (4th line) to sum the areas to get the running total of how many produced during our time period and finally we use timechart to make a regular graph of rates.
Ok, here's a follow up question:
This solution is working great for places where the data is continuous. However, if there is no data for a period of time, the total jumps discontinuously.
Here's an example:
If I had data coming in at 50 units / min for 60 min, it will totalize as 3,000 units. If I then have no data, when the total resumes it has jumped discontinuously, i.e. to 5,000. When it resumes counting it totalizes correctly in a segment of continuous data.
Does that make sense? Is there a way to compensate for that?
Apologies for all the questions, we are trialing splunk and I'm having trouble finding documentation examples pertaining to numerical/sensor data. I'm trying to replicate analyses we can already do with other tools in splunk.
I now have a transaction search that can break my data up into segments, where all the events from one continuous segment go into one transaction.
Is there a way to apply the above code to the transactions individually, then add them all together to get the total units?
Here is my simple transaction search:
sensor Value># | transaction maxpause=2m maxevents=-1
where # is a constant number that means my search only returns events where data is good. How can I apply the above to these transactions?
Divide by 60?
I am still enough of a novice that it helps to see the best/natural method to do even a simple task like that. 😐
Works great! Can you provide an example of conversion if the rate is not in units/seconds? For example, if value was in unit / minute instead of unit / second?
Thanks!!!
@martin_mueller is correct. If value was units/minute then we'd just adjust the area calculation by dividing by 60 so that line would become something like:
| eval area=(_time - lastTime)/60 * (Value+lastValue)/2
We adjust the units on our duration so that it cancels the units from the average of our values.
I was just about to post that :<
If @bmacias84's guess isn't what you're looking for you should post some sample data along with desired results.
Unfortunately posting sample data is not an option, but I did include some reference images for what I'm trying to do--basically numerical integration.
Thanks!
What i think you want is the per_minute(x), per_hour(x), per_day(), or per_second(x) available in the timechart command.
Here's my attempt:
sensor | timechart span=5m per_minute(Value) | eval rectsum=(per_minute(Value)/c)
It's still not getting at what I'm trying to do, which is basically (for a start) rectangular integration. Trapezoidal integration would be better, but I figured that I would start with rectangular and build from there.
It looks like the per_* commands are incompatible with eval, which is what I think I'd need next to do the calculation.
https://en.wikipedia.org/wiki/File:Integration_rectangle.svg
https://en.wikipedia.org/wiki/File:Integration_trapezoid.svg
What is c? per_minute is a only available as a timechart function, but your eval is implemented incorrectly. per_minute(value) should be treated as a column/field name not a function. You either have to use the quotes or the as command.
sensor | timechart span=5m per_minute(Value) | eval rectsum=("per_minute(Value)"/c
OR
sensor | timechart span=5m per_minute(Value) as per_min | eval rectsum=(per_min/c)
Yep, it was implemented incorrectly. I am just getting started with splunk and having a hard time with the learning curve. My background is process-oriented.
c is a constant that corrects the rate to the correct units/time.