Hi all Ninja's i need some help here to find this calculation which can be done easily in excel but i wanted to convert to SPL. I have tried to use streamstats but and autoregress but unable to figure out how to do it.
Problem statement:
How do I convert the calculation for mean deviation.
below formula is applied in col : j
explanation of the formula, calculation starts at count of 20, it takes the value of col : i and subtract col : h once it done with its own row take the same i value and subtract from "count - 1" after it finishes 20 period.
hopefully someone could really help as this part of calculation has me stuck for a few days which i unable to submit this to my boss for system migration to splunk.
=(ABS(I22-H22)+ABS(I22-H21)+ABS(I22-H20)+ABS(I22-H19)+ABS(I22-H18)+ABS(I22-H17)+ABS(I22-H16)+ABS(I22-H15)+ABS(I22-H14)+ABS(I22-H13)+ABS(I22-H12)+ABS(I22-H11)+ABS(I22-H10)+ABS(I22-H9)+ABS(I22-H8)+ABS(I22-H7)+ABS(I22-H6)+ABS(I22-H5)+ABS(I22-H4)+ABS(I22-H3))/20
col : h | col : i | col : j | |||||
count | date | stop | top | bottom | avg_distance | moving_avg_20_period | Mean Deviation |
1 | 2022-08-01 | 2.29 | 2.31 | 2.265 | 2.2883 | ||
2 | 2022-08-02 | 2.31 | 2.37 | 2.28 | 2.32 | ||
3 | 2022-08-03 | 2.27 | 2.36 | 2.24 | 2.29 | ||
4 | 2022-08-04 | 2.35 | 2.36 | 2.26 | 2.3233 | ||
5 | 2022-08-05 | 2.44 | 2.57 | 2.34 | 2.45 | ||
6 | 2022-08-08 | 2.49 | 2.51 | 2.41 | 2.47 | ||
7 | 2022-08-09 | 2.41 | 2.52 | 2.35 | 2.4267 | ||
8 | 2022-08-10 | 2.51 | 2.59 | 2.38 | 2.4933 | ||
9 | 2022-08-11 | 2.49 | 2.5176 | 2.43 | 2.4792 | ||
10 | 2022-08-12 | 2.38 | 2.5 | 2.26 | 2.38 | ||
11 | 2022-08-15 | 2.35 | 2.41 | 2.32 | 2.36 | ||
12 | 2022-08-16 | 2.39 | 2.41 | 2.31 | 2.37 | ||
13 | 2022-08-17 | 2.31 | 2.38 | 2.3 | 2.33 | ||
14 | 2022-08-18 | 2.28 | 2.32 | 2.23 | 2.2767 | ||
15 | 2022-08-19 | 2.11 | 2.32 | 2.055 | 2.1617 | ||
16 | 2022-08-22 | 2.08 | 2.19 | 2.07 | 2.1133 | ||
17 | 2022-08-23 | 2.02 | 2.105 | 1.92 | 2.015 | ||
18 | 2022-08-24 | 2.01 | 2.06 | 1.94 | 2.0033 | ||
19 | 2022-08-25 | 2.01 | 2.07 | 1.87 | 1.9833 | ||
20 | 2022-08-26 | 2.02 | 2.06 | 1.93 | 2.0033 | 2.2769 | 0.13814 |
21 | 2022-08-29 | 2.02 | 2.04 | 1.96 | 2.0067 | 2.2628 | 0.15529 |
22 | 2022-08-30 | 2.205 | 2.22 | 2 | 2.1417 | 2.2539 | 0.160265 |
23 | 2022-08-31 | 2.09 | 2.25 | 2.07 | 2.1367 | 2.2462 | 0.16509 |
24 | 2022-09-01 | 1.92 | 2.16 | 1.92 | 2 | 2.23 | 0.173545 |
25 | 2022-09-02 | 1.86 | 1.99 | 1.83 | 1.8933 | 2.2022 | 0.1766 |
26 | 2022-09-06 | 1.8 | 1.86 | 1.73 | 1.7967 | 2.1685 | 0.176745 |
27 | 2022-09-07 | 1.84 | 1.85 | 1.75 | 1.8133 | 2.1379 | 0.175175 |
28 | 2022-09-08 | 1.7 | 1.85 | 1.665 | 1.7383 | 2.1001 | 0.174805 |
29 | 2022-09-09 | 1.75 | 1.76 | 1.63 | 1.7133 | 2.0618 | 0.17136 |
Here is your data in a Splunk example. It gives very slightly different values due to rounding
| makeresults
| eval _raw="count date stop top bottom avg_distance moving_avg_20_period Mean_Deviation
1 2022-08-01 2.29 2.31 2.265 2.2883
2 2022-08-02 2.31 2.37 2.28 2.32
3 2022-08-03 2.27 2.36 2.24 2.29
4 2022-08-04 2.35 2.36 2.26 2.3233
5 2022-08-05 2.44 2.57 2.34 2.45
6 2022-08-08 2.49 2.51 2.41 2.47
7 2022-08-09 2.41 2.52 2.35 2.4267
8 2022-08-10 2.51 2.59 2.38 2.4933
9 2022-08-11 2.49 2.5176 2.43 2.4792
10 2022-08-12 2.38 2.5 2.26 2.38
11 2022-08-15 2.35 2.41 2.32 2.36
12 2022-08-16 2.39 2.41 2.31 2.37
13 2022-08-17 2.31 2.38 2.3 2.33
14 2022-08-18 2.28 2.32 2.23 2.2767
15 2022-08-19 2.11 2.32 2.055 2.1617
16 2022-08-22 2.08 2.19 2.07 2.1133
17 2022-08-23 2.02 2.105 1.92 2.015
18 2022-08-24 2.01 2.06 1.94 2.0033
19 2022-08-25 2.01 2.07 1.87 1.9833
20 2022-08-26 2.02 2.06 1.93 2.0033 2.2769 0.13814
21 2022-08-29 2.02 2.04 1.96 2.0067 2.2628 0.15529
22 2022-08-30 2.205 2.22 2 2.1417 2.2539 0.160265
23 2022-08-31 2.09 2.25 2.07 2.1367 2.2462 0.16509
24 2022-09-01 1.92 2.16 1.92 2 2.23 0.173545
25 2022-09-02 1.86 1.99 1.83 1.8933 2.2022 0.1766
26 2022-09-06 1.8 1.86 1.73 1.7967 2.1685 0.176745
27 2022-09-07 1.84 1.85 1.75 1.8133 2.1379 0.175175
28 2022-09-08 1.7 1.85 1.665 1.7383 2.1001 0.174805
29 2022-09-09 1.75 1.76 1.63 1.7133 2.0618 0.17136
"
| multikv forceheader=1
| table count date stop top bottom avg_distance moving_avg_20_period "Mean_Deviation"
``` FROM HERE ```
| streamstats window=20 sum(avg_distance) as coli_moving_average list(avg_distance) as avg_distance_values
| eval coli_moving_average=if(count>=20,coli_moving_average/20, null())
| eval deviation=if(count>=20, mvmap(avg_distance_values, round(abs(coli_moving_average-avg_distance_values),4)), null())
| eval mean_deviation=avg(deviation)
| fields - deviation avg_distance_values
but the part you want is from streamstats
Hope this helps.
The streamstats is key in that it calculates the moving average and also collects the list of 20 values into the 'current' row.
Then it uses mvmap to do the calcs of the absolute differences.
Here is your data in a Splunk example. It gives very slightly different values due to rounding
| makeresults
| eval _raw="count date stop top bottom avg_distance moving_avg_20_period Mean_Deviation
1 2022-08-01 2.29 2.31 2.265 2.2883
2 2022-08-02 2.31 2.37 2.28 2.32
3 2022-08-03 2.27 2.36 2.24 2.29
4 2022-08-04 2.35 2.36 2.26 2.3233
5 2022-08-05 2.44 2.57 2.34 2.45
6 2022-08-08 2.49 2.51 2.41 2.47
7 2022-08-09 2.41 2.52 2.35 2.4267
8 2022-08-10 2.51 2.59 2.38 2.4933
9 2022-08-11 2.49 2.5176 2.43 2.4792
10 2022-08-12 2.38 2.5 2.26 2.38
11 2022-08-15 2.35 2.41 2.32 2.36
12 2022-08-16 2.39 2.41 2.31 2.37
13 2022-08-17 2.31 2.38 2.3 2.33
14 2022-08-18 2.28 2.32 2.23 2.2767
15 2022-08-19 2.11 2.32 2.055 2.1617
16 2022-08-22 2.08 2.19 2.07 2.1133
17 2022-08-23 2.02 2.105 1.92 2.015
18 2022-08-24 2.01 2.06 1.94 2.0033
19 2022-08-25 2.01 2.07 1.87 1.9833
20 2022-08-26 2.02 2.06 1.93 2.0033 2.2769 0.13814
21 2022-08-29 2.02 2.04 1.96 2.0067 2.2628 0.15529
22 2022-08-30 2.205 2.22 2 2.1417 2.2539 0.160265
23 2022-08-31 2.09 2.25 2.07 2.1367 2.2462 0.16509
24 2022-09-01 1.92 2.16 1.92 2 2.23 0.173545
25 2022-09-02 1.86 1.99 1.83 1.8933 2.2022 0.1766
26 2022-09-06 1.8 1.86 1.73 1.7967 2.1685 0.176745
27 2022-09-07 1.84 1.85 1.75 1.8133 2.1379 0.175175
28 2022-09-08 1.7 1.85 1.665 1.7383 2.1001 0.174805
29 2022-09-09 1.75 1.76 1.63 1.7133 2.0618 0.17136
"
| multikv forceheader=1
| table count date stop top bottom avg_distance moving_avg_20_period "Mean_Deviation"
``` FROM HERE ```
| streamstats window=20 sum(avg_distance) as coli_moving_average list(avg_distance) as avg_distance_values
| eval coli_moving_average=if(count>=20,coli_moving_average/20, null())
| eval deviation=if(count>=20, mvmap(avg_distance_values, round(abs(coli_moving_average-avg_distance_values),4)), null())
| eval mean_deviation=avg(deviation)
| fields - deviation avg_distance_values
but the part you want is from streamstats
Hope this helps.
The streamstats is key in that it calculates the moving average and also collects the list of 20 values into the 'current' row.
Then it uses mvmap to do the calcs of the absolute differences.