Archive

## How to perform spectrum analysis?

Builder

I do not see FFT or other Fourier transform functions. If I must use an external script, I need the output to be searchable, as a summary index or something. How do I do that?

Tags (5)
1 Solution
Builder

Following @martin_mueller's R-rated suggestion and help from R-rated app author @rfujara_splunk😉 as well as a frantic search for cheap interpolation, the following is a recipe to analyse event count.

``````  | timechart count
| appendpipe [
| stats count
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart max(count) as COUNT
| fillnull
| eventstats count as TOTAL
| r "output=transform(input,FFT=Mod(fft(COUNT)),Freq=((1:TOTAL)-1)/(TOTAL*X_span))"
``````

Application notes

1. You need to install the R app. See @martin_meuller's answer above.
2. For event counts, gaps should be interpreted as 0. The largest part of the above search is to do just that, thanks to @somesoni2's answer to my question.
3. The `eventstats` to obtain `TOTAL` is superficial and a waste of computation. There should be a better way to do this within R.
4. The above only outputs modulus of the transformation because counts are all real numbers. You can output the complex numbers by ridding `Mod()` from the above. (Interestingly, although Splunk lacks complex number arithmetics, its stats functions accepts complex numbers. Maybe it takes the real part and discards imaginary part as NaN.)
5. `Freq` is a dummy sequence for interpretation, expressed in hertz. You can chart over `Freq`, for example.
6. Maximum frequency you can analyse is 0.5/`span`. `span` in both `timechart` calls must be equal.
7. Beware of an undesirable side effect of `timechart` used to fill gaps: It forces an extra interval.

A few F(FT)-words

1. As discrete Fourier transform goes, you only look at half of the output sequence (positive frequencies) when inputs are all real.
2. When analyzing (all-positive) event counts, output at frequency 0 is meaningless, as this component contains the strong DC bias.
3. `fft()` uses a square sampling window. Spectrum leakage could diffuse your analysis especially when dealing with black-and-white data such as event counts.

R-rated notes

1. Object `input` from Splunk is in "data frame” class. You need to “transform" it into arrays that most R functions deal with. The `transform()` function in the above has nothing to do with Fourier transformation. The latter is performed in `fft()` function.
2. In addition to fields you pass to R, `input` also passes certain Splunk internal fields as X-rated objects. In the above, X_span is `span` in the last stats function (`timechart`); you also have access to X_time which corresponds to _time in Splunk. (This is perhaps not limited to R app.)

The above doesn’t address how to separate data series into R arrays then output transformed objects. That will be my end goal. But it’s a good start.

Builder

Following @martin_mueller's R-rated suggestion and help from R-rated app author @rfujara_splunk😉 as well as a frantic search for cheap interpolation, the following is a recipe to analyse event count.

``````  | timechart count
| appendpipe [
| stats count
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart max(count) as COUNT
| fillnull
| eventstats count as TOTAL
| r "output=transform(input,FFT=Mod(fft(COUNT)),Freq=((1:TOTAL)-1)/(TOTAL*X_span))"
``````

Application notes

1. You need to install the R app. See @martin_meuller's answer above.
2. For event counts, gaps should be interpreted as 0. The largest part of the above search is to do just that, thanks to @somesoni2's answer to my question.
3. The `eventstats` to obtain `TOTAL` is superficial and a waste of computation. There should be a better way to do this within R.
4. The above only outputs modulus of the transformation because counts are all real numbers. You can output the complex numbers by ridding `Mod()` from the above. (Interestingly, although Splunk lacks complex number arithmetics, its stats functions accepts complex numbers. Maybe it takes the real part and discards imaginary part as NaN.)
5. `Freq` is a dummy sequence for interpretation, expressed in hertz. You can chart over `Freq`, for example.
6. Maximum frequency you can analyse is 0.5/`span`. `span` in both `timechart` calls must be equal.
7. Beware of an undesirable side effect of `timechart` used to fill gaps: It forces an extra interval.

A few F(FT)-words

1. As discrete Fourier transform goes, you only look at half of the output sequence (positive frequencies) when inputs are all real.
2. When analyzing (all-positive) event counts, output at frequency 0 is meaningless, as this component contains the strong DC bias.
3. `fft()` uses a square sampling window. Spectrum leakage could diffuse your analysis especially when dealing with black-and-white data such as event counts.

R-rated notes

1. Object `input` from Splunk is in "data frame” class. You need to “transform" it into arrays that most R functions deal with. The `transform()` function in the above has nothing to do with Fourier transformation. The latter is performed in `fft()` function.
2. In addition to fields you pass to R, `input` also passes certain Splunk internal fields as X-rated objects. In the above, X_span is `span` in the last stats function (`timechart`); you also have access to X_time which corresponds to _time in Splunk. (This is perhaps not limited to R app.)

The above doesn’t address how to separate data series into R arrays then output transformed objects. That will be my end goal. But it’s a good start.

Builder

Finally figured out how to handle multiple Splunk data series. R also has this concept of "multivalue", hence `mvfft()`.
```| r " D=length(input)-1 N=length(input[[1]]) N_span=N*input\$X_span output=data.frame(Freq=((1:N)-1)/(N_span),Mod(mvfft(as.matrix(input[2:D])))) "```

Here, X_span is from Splunk `_span`. (You can also access Splunk _time in X_time.) R app adds "X" to input series names. For example, if you do `timechart count as COUNT by host`, it will output `Freq` and `Xhost1`, `Xhost2`, etc.

Filling 0 in timechart is not the best interpolation for FFT. Better use R's own capability.

Builder

That's a really interesting bug. It doesn't show in preview mode.

SplunkTrust

Lovely writeup... however, you're suffering from a Splunk Answers bug that doesn't let you use more than a certain number of backtick-enclosed code segments `like this`, see those eventstats0 eventstats1 etc. bits near the end.

SplunkTrust

I believe R is capable of FFT, take a look at http://apps.splunk.com/app/1735/ for using R within Splunk.

SplunkTrust

Streamstats isn't expensive in and of itself, it runs over the data once... however, there's two streamstatses and two reverses in there, so for large data sets it's going to add up.

Builder

Not familiar with cost of streamstats, but excellent work on a straight-Splunk interpolation. You may want to give an answer in http://answers.splunk.com/answers/79513/. I made a nuanced analysis there.

For my use case, I need to make sure missing data are treated as 0. @somesoni2 offered an inexpensive way to do this in http://answers.splunk.com/answer_link/149598/.

SplunkTrust

First line grabs data and builds a `timechart` with data gaps in it.
Second line prepares lots of data to fill in the gaps: previous value, next value, time of previous value, time of next value
Last line calculates the naïve linearly interpolated value.

Some results:

``````_time                ev  interpolated_ev
2014-07-30 00:55:00  99
2014-07-30 00:55:10      98.000000
2014-07-30 00:55:20      97.000000
2014-07-30 00:55:30  96
2014-07-30 00:55:40      101.000000
2014-07-30 00:55:50      106.000000
2014-07-30 00:56:00 111
``````
SplunkTrust

Here's a run-anywhere example using `_internal` data coming in every 30s, interpolated to 10s:

``````index=_internal eps="*" group=per_host_thruput | head 10 | timechart fixedrange=f span=10s avg(ev) as ev
| eval value_time = case(isnotnull(ev), _time) | streamstats last(ev) as last_ev last(value_time) as last_time | reverse | streamstats last(ev) as next_ev last(value_time) as next_time | reverse
| eval interpolated_ev = last_ev + ((_time - last_time) / (next_time - last_time)) * (next_ev - last_ev)
``````
SplunkTrust

If you have more data points than you need you can make them equally paced using `timechart`.

If you have too few data points you can do the same and throw some `streamstats` shenanigans in the mix... won't be fast for a large data set though.

Builder

Another note: FFT operates only on equally paced samples, i.e., data of constant sampling rate. The majority of Splunk data are not constant-rate. I have yet to find an easy way for interpolation.

Builder

🙂 Or I can just use PDF; in fact, R provides (thoughtfully) EPUB version, too. I'm just extremely uncomfortable reading serious documents on screen. (But of course, I'm not to convert 3K pages into dead trees, either.)

SplunkTrust

You could probably buy a dedicated R-manual-Kindle for the price of printing that 😄

Builder

Thank you! With 3,397 pages of reference manual and a 155-page intro, I still have a lot of trees to kill. But yes, FFT is expressed in one function! And the R app makes it all integral within Splunk. Brilliant.