Archive
Highlighted

Event by Event Math

New Member

I have a data stream that produces a series of values at a series of times. I need to do running calculations based on the values. Both eval and streamstats seem to have drawbacks that I'm not quite sure how to work around.

Given a series of values:

Time   Value      dt*Val                   Foo
10s      100        1000                   --
20s       50        500                    function(time, value of current step, dt*value *of prior step*)
30s       125       1250

I need to do a series of running calculations based on the prior event.

So eval works great to produce the dt*Value column, but it runs down the whole dataset at one go, so I can't use it to look at the prior step.

On the other hand, streamstats will let me extract a value from the prior step, but I can't figure out how to do calculations based on that value. If I try to execute an eval from the streamstats data, it once again goes through and overwrites the whole data set.

I know what I'm trying to do is make Splunk behave like Excel, but is there a straightforward way of having information added to each event in sequence based on calculations made in the prior event?

Tags (3)
0 Karma
Highlighted

Re: Event by Event Math

SplunkTrust
SplunkTrust

So, here's a run anywhere example that I'll explain, then see if we can't adapt to exactly what you need. You don't mention exactly what calculation you need, so let's see if this gets you close enough to make it work in your case.

| makeresults count=5 
| streamstats count 
| streamstats window=2 sum(count) as RunningSum

Paste that into any recent Splunk version and press enter. Take a look at the results.

Line 1 just generates 5 "events" we can work with.
Line 2 is NOT a streamstats we actually care about - it's a setup. This creates a field that essentially just counts how many event it's seen so far, so in the 5 events you have 1, 2, 3, 4, and 5 as the last count.
Line 3 is what we're after. There's a window=2 which tells it to only look at two events at a time, then it sums the existing count field (which we just added in line 1) into a new field called RunningSum.

RunningSum is the sum of the previous two event's count field. 1, 3, 5, 7 9.

Most of the use cases I've seen where folks want things to act like Excel don't need more advanced ways of using first(X) or last(X) in their streamstats, but in your case you do.

So for your example, you have some pseudocode that you have a function, or want to make one. It looks like

function(time, value of current step, dt*value *of prior step*)

I think you'll want a something akin to this:

| makeresults count=5 
| streamstats count AS Time 
| eval Time = Time * 10
| streamstats window=1 current=f first(Time) AS TimePrevious
| eval MyCalculation = Time * TimePrevious

In there, my streamstats in line 4 uses a window of 1 (e.g. only a single event), but I've told it current=false which means do not include the current event as part of this. Then my first(Time) AS TimePrevious is the value of the Previous event's Time.

I then go on in line 5 to do some random eval with Time multiplied by TimePrevious to show doing math isn't hard at that point. It's not quite your example, but ought to be able to be modified to it pretty easily.

Let us know if this helps! Happy Splunking!
-Rich

0 Karma
Highlighted

Re: Event by Event Math

New Member

That's close, but it doesn't quite work, because I need an iterative process.

Imagine this set of events with one input and three calculated values per event:

A Inc= 2A Dec=0 Total=T
B Inc= 2
B Dec=0.5T Q=T+Inc-Dec
C Inc= 2
C Dec=0.5*Q R=Q+Inc-Dec

So I have a running total, but it is incremented and decremented by calculated values based on the prior step.

The problem I'm having is that both streamstats and eval run on the entire dataset at one time. However, what I need to do is run streamstats and eval alternately on ONE step at a time.

If I run the eval command for "Dec" before streamstats, it just returns zeros, because the fourth column isn't populated yet.
However, if I run the streamstats command to pull down the total first, it just returns zeros, because the eval command for the fourth column hasn't been run yet.

This is a situation where it would be immensely helpful to be able to run an eval command on just one event, and not the whole dataset, but I can't figure out how to do that.

0 Karma
Highlighted

Re: Event by Event Math

SplunkTrust
SplunkTrust

I'm not sure I yet understand the actual need, but that last thing you asked:

| eval MyConditionalField = if(fieldA=XYZ,T+Inc-Dec,null() ) ...

There's a variety of conditional sorts of things you can use in there. In this case if fieldA equals XYZ, then set MyConditionalField to T+Inc-Dec, otherwise leave it unset completely (so the field doesn't even exist).

UNTESTED CODE, by the way. All typos are included free of additional charge. Your mileage may vary. 🙂

0 Karma
Highlighted

Re: Event by Event Math

New Member

What you're suggesting is still running an eval command on every event at once, just in such a way that it only produces a result for one event. What I need is a way to effectively do this:

  • Eval on event 1 only
  • Streamstats on event 2 only
  • Eval on event 2 only
  • Streamstats on event 3 only
  • etc.

The problem is that I'm not just calculating a running total - I'm running calculations on the previous total, and using the result of that calculation to modify the new one. So I need a fresh eval after each step - I can't run it in batch on the whole dataset at once.

0 Karma
Highlighted

Re: Event by Event Math

SplunkTrust
SplunkTrust

It is possible that you may have a round hole with square peg problem. I still don't think so though, so let's recap for a second and make sure we're on the same page. NOTE - there are wrinkles in this that remain to be sorted out, I just don't think they're unsolvable and I think we're 90% there.

You provided this example:

A Inc= 2*A Dec=0 Total=T
B Inc= 2*B Dec=0.5*T Q=T+Inc-Dec
C Inc= 2*C Dec=0.5*Q R=Q+Inc-Dec

I assume for starters that the ABC order is able to be achieved easily (e.g. they're time order or whatever).

I'm pseudo-coding this, but .... Inc is easy, we'll not talk about it. Dec is ...

Wait. You know what? Let me try to whack together a few lines of actual data into one of my test instances and actually do this. Might take me a bit to get around to it, though, if that's OK.

Can we talk quickly about a set of test data so I'm going down the right path?

If I had a time-sequence of values A, B, C, D, .... Let's say they're reasonably small integers. That's like, the only actual data in here you are operating on, all the rest is calculations, right? That means I don't even need to create a data set, I can probably find something that I can use for a dozen iterations of this in my existing data already. 🙂

If that's the case, then you are trying to create your own weighted average with a exponentially decreasing weighting, right? (At least as the tricky part).

Also if it's OK, can I email you (I can get the email you have associated with your Answers login) for further back and forth if necessary so we don't clutter up this answer too much?

0 Karma
Highlighted

Re: Event by Event Math

New Member

Email's fine, thanks. If we sort out a solution, we can put it up here for future users. You're right that the specifics of the data aren't critical for the concept, and that an exponential decay is indeed what we're talking about here. "Weighted Average" is actually a step more than needed here - it's not the weighting that's decreasing exponentially, it's the total itself.

0 Karma
Highlighted

Re: Event by Event Math

SplunkTrust
SplunkTrust

Right right right. Been more than 20 years since college. All those words getting mixed up now. 🙂

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.