Archive

interpolating non matching values before correlating two series

Explorer

Hello,
We want to produce correlations between two different (timestamp,value) series. We basically want to plot one value against the other and show the results on a chart.
We can get the data we want in a table like this:

timestamp,value1,value2
123456789,x1,y1
123456790,x2,
123456800,,y3
...

As you can see in the example above, we can have gaps (nulls) in the data, corresponding to timestamps when either one or the other series does not have a recorded value.

Can Splunk fill in those gaps by interpolating the missing values? How?

After doing this we would get a table like this:

timestamp,value1,value2
123456789,x1,y1
123456790,x2,y2_interp
123456800,x3_interp,y3
...

where x3_interp and y2_interp are values obtained by doing some interpolation on the x and y series (Spline, linear etc).

The we would apply "| chart v1 by v2" to see the graph.

Cheers,
Alex

Tags (2)
0 Karma

SplunkTrust
SplunkTrust

A native Splunk solution, cross-posted from http://answers.splunk.com/answers/147907/how-to-perform-spectrum-analysis

Here's a run-anywhere example using _internal data coming in every 30s, interpolated to 10s:

  index=_internal eps="*" group=per_host_thruput | head 10 | timechart fixedrange=f span=10s avg(ev) as ev
| eval value_time = case(isnotnull(ev), _time) | streamstats last(ev) as last_ev last(value_time) as last_time | reverse | streamstats last(ev) as next_ev last(value_time) as next_time | reverse
| eval interpolated_ev = last_ev + ((_time - last_time) / (next_time - last_time)) * (next_ev - last_ev)

First line grabs data and builds a timechart with data gaps in it.
Second line prepares lots of data to fill in the gaps: previous value, next value, time of previous value, time of next value
Last line calculates the naïve linearly interpolated value.
Some results:

_time                ev  interpolated_ev
2014-07-30 00:55:00  99
2014-07-30 00:55:10      98.000000
2014-07-30 00:55:20      97.000000
2014-07-30 00:55:30  96
2014-07-30 00:55:40      101.000000
2014-07-30 00:55:50      106.000000
2014-07-30 00:56:00 111

Builder

I came upon this while searching for interpolation solution myself. After comparing your use case and my own, I come to the following.

  • If value space is continuous and spline is appropriate, R provides several spline functions (but not linear) for constant-rate data; CRAN (R's equivalent to CPAN) offers several spline functions (including linear) for variable-rate data. As @martin_mueller kindly pointed out, R offers an app in Splunk. I believe that you can write your own interpolation function by creating custom search commands, too.
  • If value space is discrete and missing values should be interpreted as 0, it depends on sampling rate. Nothing needs to be done (except filling missing values with 0) if sampling rate is constant. If not, Splunk's own timechart function can provide an approximation for linear interpolation. (See this answer by @somesoni2 for a complete solution to fill in the blanks when sampling rate is variable.)
  • If value space is discrete and missing values must not be interpreted as 0, interpolation using custom search command is perhaps the best option.