Splunk Search

interpolating non matching values before correlating two series


We want to produce correlations between two different (timestamp,value) series. We basically want to plot one value against the other and show the results on a chart.
We can get the data we want in a table like this:


As you can see in the example above, we can have gaps (nulls) in the data, corresponding to timestamps when either one or the other series does not have a recorded value.

Can Splunk fill in those gaps by interpolating the missing values? How?

After doing this we would get a table like this:


where x3_interp and y2_interp are values obtained by doing some interpolation on the x and y series (Spline, linear etc).

The we would apply "| chart v1 by v2" to see the graph.


Tags (2)
0 Karma


A native Splunk solution, cross-posted from http://answers.splunk.com/answers/147907/how-to-perform-spectrum-analysis

Here's a run-anywhere example using _internal data coming in every 30s, interpolated to 10s:

  index=_internal eps="*" group=per_host_thruput | head 10 | timechart fixedrange=f span=10s avg(ev) as ev
| eval value_time = case(isnotnull(ev), _time) | streamstats last(ev) as last_ev last(value_time) as last_time | reverse | streamstats last(ev) as next_ev last(value_time) as next_time | reverse
| eval interpolated_ev = last_ev + ((_time - last_time) / (next_time - last_time)) * (next_ev - last_ev)

First line grabs data and builds a timechart with data gaps in it.
Second line prepares lots of data to fill in the gaps: previous value, next value, time of previous value, time of next value
Last line calculates the naïve linearly interpolated value.
Some results:

_time                ev  interpolated_ev
2014-07-30 00:55:00  99
2014-07-30 00:55:10      98.000000
2014-07-30 00:55:20      97.000000
2014-07-30 00:55:30  96
2014-07-30 00:55:40      101.000000
2014-07-30 00:55:50      106.000000
2014-07-30 00:56:00 111


I came upon this while searching for interpolation solution myself. After comparing your use case and my own, I come to the following.

  • If value space is continuous and spline is appropriate, R provides several spline functions (but not linear) for constant-rate data; CRAN (R's equivalent to CPAN) offers several spline functions (including linear) for variable-rate data. As @martin_mueller kindly pointed out, R offers an app in Splunk. I believe that you can write your own interpolation function by creating custom search commands, too.
  • If value space is discrete and missing values should be interpreted as 0, it depends on sampling rate. Nothing needs to be done (except filling missing values with 0) if sampling rate is constant. If not, Splunk's own timechart function can provide an approximation for linear interpolation. (See this answer by @somesoni2 for a complete solution to fill in the blanks when sampling rate is variable.)
  • If value space is discrete and missing values must not be interpreted as 0, interpolation using custom search command is perhaps the best option.
Get Updates on the Splunk Community!

Dashboard Studio Challenge - Learn New Tricks, Showcase Your Skills, and Win Prizes!

Reimagine what you can do with your dashboards. Dashboard Studio is Splunk’s newest dashboard builder to ...

Introducing Edge Processor: Next Gen Data Transformation

We get it - not only can it take a lot of time, money and resources to get data into Splunk, but it also takes ...

Take the 2021 Splunk Career Survey for $50 in Amazon Cash

Help us learn about how Splunk has impacted your career by taking the 2021 Splunk Career Survey. Last year’s ...