Following @martin_mueller's R-rated suggestion and help from R-rated app author @rfujara_splunk😉 as well as a frantic search for cheap interpolation, the following is a recipe to analyse event count.
| timechart count
| appendpipe [
| stats count
| addinfo
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart max(count) as COUNT
| fillnull
| eventstats count as TOTAL
| r "output=transform(input,FFT=Mod(fft(COUNT)),Freq=((1:TOTAL)-1)/(TOTAL*X_span))"
Application notes
eventstats
to obtain TOTAL
is superficial and a waste of computation. There should be a better way to do this within R.Mod()
from the above. (Interestingly, although Splunk lacks complex number arithmetics, its stats functions accepts complex numbers. Maybe it takes the real part and discards imaginary part as NaN.)Freq
is a dummy sequence for interpretation, expressed in hertz. You can chart over Freq
, for example.span
. span
in both timechart
calls must be equal.timechart
used to fill gaps: It forces an extra interval.A few F(FT)-words
fft()
uses a square sampling window. Spectrum leakage could diffuse your analysis especially when dealing with black-and-white data such as event counts.R-rated notes
input
from Splunk is in "data frame” class. You need to “transform" it into arrays that most R functions deal with. The transform()
function in the above has nothing to do with Fourier transformation. The latter is performed in fft()
function.input
also passes certain Splunk internal fields as X-rated objects. In the above, X_span is span
in the last stats function (timechart
); you also have access to X_time which corresponds to _time in Splunk. (This is perhaps not limited to R app.)The above doesn’t address how to separate data series into R arrays then output transformed objects. That will be my end goal. But it’s a good start.
A little bit dirty and of course not too fast, but , easy to implement as a macro without any library using builtin trigonometry from splunk:
| makeresults
| eval x="35 21 39 88"
| makemv x
| mvexpand x
| eventstats count as N
| streamstats count as n
| eval n=n-1
| fields - _time
| eventstats values(n) as k
| mvexpand k
| stats sum(eval(x*cos(2*pi()/N * k * n))) as Fr sum(eval(x*sin(2*pi()/N * k * n) * -1)) as Fi by k
Following @martin_mueller's R-rated suggestion and help from R-rated app author @rfujara_splunk😉 as well as a frantic search for cheap interpolation, the following is a recipe to analyse event count.
| timechart count
| appendpipe [
| stats count
| addinfo
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart max(count) as COUNT
| fillnull
| eventstats count as TOTAL
| r "output=transform(input,FFT=Mod(fft(COUNT)),Freq=((1:TOTAL)-1)/(TOTAL*X_span))"
Application notes
eventstats
to obtain TOTAL
is superficial and a waste of computation. There should be a better way to do this within R.Mod()
from the above. (Interestingly, although Splunk lacks complex number arithmetics, its stats functions accepts complex numbers. Maybe it takes the real part and discards imaginary part as NaN.)Freq
is a dummy sequence for interpretation, expressed in hertz. You can chart over Freq
, for example.span
. span
in both timechart
calls must be equal.timechart
used to fill gaps: It forces an extra interval.A few F(FT)-words
fft()
uses a square sampling window. Spectrum leakage could diffuse your analysis especially when dealing with black-and-white data such as event counts.R-rated notes
input
from Splunk is in "data frame” class. You need to “transform" it into arrays that most R functions deal with. The transform()
function in the above has nothing to do with Fourier transformation. The latter is performed in fft()
function.input
also passes certain Splunk internal fields as X-rated objects. In the above, X_span is span
in the last stats function (timechart
); you also have access to X_time which corresponds to _time in Splunk. (This is perhaps not limited to R app.)The above doesn’t address how to separate data series into R arrays then output transformed objects. That will be my end goal. But it’s a good start.
Finally figured out how to handle multiple Splunk data series. R also has this concept of "multivalue", hence mvfft()
.
| r "
D=length(input)-1
N=length(input[[1]])
N_span=N*input$X_span
output=data.frame(Freq=((1:N)-1)/(N_span),Mod(mvfft(as.matrix(input[2:D]))))
"
Here, X_span is from Splunk _span
. (You can also access Splunk _time in X_time.) R app adds "X" to input series names. For example, if you do timechart count as COUNT by host
, it will output Freq
and Xhost1
, Xhost2
, etc.
Filling 0 in timechart is not the best interpolation for FFT. Better use R's own capability.
That's a really interesting bug. It doesn't show in preview mode.
Lovely writeup... however, you're suffering from a Splunk Answers bug that doesn't let you use more than a certain number of backtick-enclosed code segments like this
, see those eventstats0 eventstats1 etc. bits near the end.
I believe R is capable of FFT, take a look at http://apps.splunk.com/app/1735/ for using R within Splunk.
Streamstats isn't expensive in and of itself, it runs over the data once... however, there's two streamstatses and two reverses in there, so for large data sets it's going to add up.
Not familiar with cost of streamstats, but excellent work on a straight-Splunk interpolation. You may want to give an answer in http://answers.splunk.com/answers/79513/. I made a nuanced analysis there.
For my use case, I need to make sure missing data are treated as 0. @somesoni2 offered an inexpensive way to do this in http://answers.splunk.com/answer_link/149598/.
First line grabs data and builds a timechart
with data gaps in it.
Second line prepares lots of data to fill in the gaps: previous value, next value, time of previous value, time of next value
Last line calculates the naïve linearly interpolated value.
Some results:
_time ev interpolated_ev
2014-07-30 00:55:00 99
2014-07-30 00:55:10 98.000000
2014-07-30 00:55:20 97.000000
2014-07-30 00:55:30 96
2014-07-30 00:55:40 101.000000
2014-07-30 00:55:50 106.000000
2014-07-30 00:56:00 111
Here's a run-anywhere example using _internal
data coming in every 30s, interpolated to 10s:
index=_internal eps="*" group=per_host_thruput | head 10 | timechart fixedrange=f span=10s avg(ev) as ev
| eval value_time = case(isnotnull(ev), _time) | streamstats last(ev) as last_ev last(value_time) as last_time | reverse | streamstats last(ev) as next_ev last(value_time) as next_time | reverse
| eval interpolated_ev = last_ev + ((_time - last_time) / (next_time - last_time)) * (next_ev - last_ev)
If you have more data points than you need you can make them equally paced using timechart
.
If you have too few data points you can do the same and throw some streamstats
shenanigans in the mix... won't be fast for a large data set though.
Another note: FFT operates only on equally paced samples, i.e., data of constant sampling rate. The majority of Splunk data are not constant-rate. I have yet to find an easy way for interpolation.
🙂 Or I can just use PDF; in fact, R provides (thoughtfully) EPUB version, too. I'm just extremely uncomfortable reading serious documents on screen. (But of course, I'm not to convert 3K pages into dead trees, either.)
You could probably buy a dedicated R-manual-Kindle for the price of printing that 😄
Thank you! With 3,397 pages of reference manual and a 155-page intro, I still have a lot of trees to kill. But yes, FFT is expressed in one function! And the R app makes it all integral within Splunk. Brilliant.