Archive

How to perform spectrum analysis?

Builder

I do not see FFT or other Fourier transform functions. If I must use an external script, I need the output to be searchable, as a summary index or something. How do I do that?

1 Solution

Builder

Following @martin_mueller's R-rated suggestion and help from R-rated app author @rfujara_splunk😉 as well as a frantic search for cheap interpolation, the following is a recipe to analyse event count.

  | timechart count
  | appendpipe [
    | stats count
    | addinfo
    | eval temp=info_min_time."##".info_max_time
    | fields temp count
    | makemv temp delim="##"
    | mvexpand temp 
    | rename temp as _time
  ] | timechart max(count) as COUNT
  | fillnull
  | eventstats count as TOTAL
  | r "output=transform(input,FFT=Mod(fft(COUNT)),Freq=((1:TOTAL)-1)/(TOTAL*X_span))"

Application notes

  1. You need to install the R app. See @martin_meuller's answer above.
  2. For event counts, gaps should be interpreted as 0. The largest part of the above search is to do just that, thanks to @somesoni2's answer to my question.
  3. The eventstats to obtain TOTAL is superficial and a waste of computation. There should be a better way to do this within R.
  4. The above only outputs modulus of the transformation because counts are all real numbers. You can output the complex numbers by ridding Mod() from the above. (Interestingly, although Splunk lacks complex number arithmetics, its stats functions accepts complex numbers. Maybe it takes the real part and discards imaginary part as NaN.)
  5. Freq is a dummy sequence for interpretation, expressed in hertz. You can chart over Freq, for example.
  6. Maximum frequency you can analyse is 0.5/span. span in both timechart calls must be equal.
  7. Beware of an undesirable side effect of timechart used to fill gaps: It forces an extra interval.

A few F(FT)-words

  1. As discrete Fourier transform goes, you only look at half of the output sequence (positive frequencies) when inputs are all real.
  2. When analyzing (all-positive) event counts, output at frequency 0 is meaningless, as this component contains the strong DC bias.
  3. fft() uses a square sampling window. Spectrum leakage could diffuse your analysis especially when dealing with black-and-white data such as event counts.

R-rated notes

  1. Object input from Splunk is in "data frame” class. You need to “transform" it into arrays that most R functions deal with. The transform() function in the above has nothing to do with Fourier transformation. The latter is performed in fft() function.
  2. In addition to fields you pass to R, input also passes certain Splunk internal fields as X-rated objects. In the above, X_span is span in the last stats function (timechart); you also have access to X_time which corresponds to _time in Splunk. (This is perhaps not limited to R app.)

The above doesn’t address how to separate data series into R arrays then output transformed objects. That will be my end goal. But it’s a good start.

View solution in original post

Builder

Following @martin_mueller's R-rated suggestion and help from R-rated app author @rfujara_splunk😉 as well as a frantic search for cheap interpolation, the following is a recipe to analyse event count.

  | timechart count
  | appendpipe [
    | stats count
    | addinfo
    | eval temp=info_min_time."##".info_max_time
    | fields temp count
    | makemv temp delim="##"
    | mvexpand temp 
    | rename temp as _time
  ] | timechart max(count) as COUNT
  | fillnull
  | eventstats count as TOTAL
  | r "output=transform(input,FFT=Mod(fft(COUNT)),Freq=((1:TOTAL)-1)/(TOTAL*X_span))"

Application notes

  1. You need to install the R app. See @martin_meuller's answer above.
  2. For event counts, gaps should be interpreted as 0. The largest part of the above search is to do just that, thanks to @somesoni2's answer to my question.
  3. The eventstats to obtain TOTAL is superficial and a waste of computation. There should be a better way to do this within R.
  4. The above only outputs modulus of the transformation because counts are all real numbers. You can output the complex numbers by ridding Mod() from the above. (Interestingly, although Splunk lacks complex number arithmetics, its stats functions accepts complex numbers. Maybe it takes the real part and discards imaginary part as NaN.)
  5. Freq is a dummy sequence for interpretation, expressed in hertz. You can chart over Freq, for example.
  6. Maximum frequency you can analyse is 0.5/span. span in both timechart calls must be equal.
  7. Beware of an undesirable side effect of timechart used to fill gaps: It forces an extra interval.

A few F(FT)-words

  1. As discrete Fourier transform goes, you only look at half of the output sequence (positive frequencies) when inputs are all real.
  2. When analyzing (all-positive) event counts, output at frequency 0 is meaningless, as this component contains the strong DC bias.
  3. fft() uses a square sampling window. Spectrum leakage could diffuse your analysis especially when dealing with black-and-white data such as event counts.

R-rated notes

  1. Object input from Splunk is in "data frame” class. You need to “transform" it into arrays that most R functions deal with. The transform() function in the above has nothing to do with Fourier transformation. The latter is performed in fft() function.
  2. In addition to fields you pass to R, input also passes certain Splunk internal fields as X-rated objects. In the above, X_span is span in the last stats function (timechart); you also have access to X_time which corresponds to _time in Splunk. (This is perhaps not limited to R app.)

The above doesn’t address how to separate data series into R arrays then output transformed objects. That will be my end goal. But it’s a good start.

View solution in original post

Builder

Finally figured out how to handle multiple Splunk data series. R also has this concept of "multivalue", hence mvfft().
| r "
D=length(input)-1
N=length(input[[1]])
N_span=N*input$X_span
output=data.frame(Freq=((1:N)-1)/(N_span),Mod(mvfft(as.matrix(input[2:D]))))
"

Here, X_span is from Splunk _span. (You can also access Splunk _time in X_time.) R app adds "X" to input series names. For example, if you do timechart count as COUNT by host, it will output Freq and Xhost1, Xhost2, etc.

Filling 0 in timechart is not the best interpolation for FFT. Better use R's own capability.

0 Karma

Builder

That's a really interesting bug. It doesn't show in preview mode.

0 Karma

SplunkTrust
SplunkTrust

Lovely writeup... however, you're suffering from a Splunk Answers bug that doesn't let you use more than a certain number of backtick-enclosed code segments like this, see those eventstats0 eventstats1 etc. bits near the end.

0 Karma

SplunkTrust
SplunkTrust

I believe R is capable of FFT, take a look at http://apps.splunk.com/app/1735/ for using R within Splunk.

SplunkTrust
SplunkTrust

Streamstats isn't expensive in and of itself, it runs over the data once... however, there's two streamstatses and two reverses in there, so for large data sets it's going to add up.

0 Karma

Builder

Not familiar with cost of streamstats, but excellent work on a straight-Splunk interpolation. You may want to give an answer in http://answers.splunk.com/answers/79513/. I made a nuanced analysis there.

For my use case, I need to make sure missing data are treated as 0. @somesoni2 offered an inexpensive way to do this in http://answers.splunk.com/answer_link/149598/.

0 Karma

SplunkTrust
SplunkTrust

First line grabs data and builds a timechart with data gaps in it.
Second line prepares lots of data to fill in the gaps: previous value, next value, time of previous value, time of next value
Last line calculates the naïve linearly interpolated value.

Some results:

_time                ev  interpolated_ev
2014-07-30 00:55:00  99
2014-07-30 00:55:10      98.000000
2014-07-30 00:55:20      97.000000
2014-07-30 00:55:30  96
2014-07-30 00:55:40      101.000000
2014-07-30 00:55:50      106.000000
2014-07-30 00:56:00 111
0 Karma

SplunkTrust
SplunkTrust

Here's a run-anywhere example using _internal data coming in every 30s, interpolated to 10s:

index=_internal eps="*" group=per_host_thruput | head 10 | timechart fixedrange=f span=10s avg(ev) as ev
| eval value_time = case(isnotnull(ev), _time) | streamstats last(ev) as last_ev last(value_time) as last_time | reverse | streamstats last(ev) as next_ev last(value_time) as next_time | reverse
| eval interpolated_ev = last_ev + ((_time - last_time) / (next_time - last_time)) * (next_ev - last_ev)
0 Karma

SplunkTrust
SplunkTrust

If you have more data points than you need you can make them equally paced using timechart.

If you have too few data points you can do the same and throw some streamstats shenanigans in the mix... won't be fast for a large data set though.

0 Karma

Builder

Another note: FFT operates only on equally paced samples, i.e., data of constant sampling rate. The majority of Splunk data are not constant-rate. I have yet to find an easy way for interpolation.

0 Karma

Builder

🙂 Or I can just use PDF; in fact, R provides (thoughtfully) EPUB version, too. I'm just extremely uncomfortable reading serious documents on screen. (But of course, I'm not to convert 3K pages into dead trees, either.)

SplunkTrust
SplunkTrust

You could probably buy a dedicated R-manual-Kindle for the price of printing that 😄

0 Karma

Builder

Thank you! With 3,397 pages of reference manual and a 155-page intro, I still have a lot of trees to kill. But yes, FFT is expressed in one function! And the R app makes it all integral within Splunk. Brilliant.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!