Hi
here is the default spl of App: Splunk App for Data Science and Deep Learning (Time Series Anomalies with STUMPY -Time Series Anomaly Detection with Matrix Profiles)
| inputlookup cyclical_business_process.csv
| eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S")
| timechart span=15m avg(logons) as logons
| fit MLTKContainer algo=stumpy m=96 logons from _time into app:stumpy_anomalies
| table _time logons matrix_profile
| eventstats p95(matrix_profile) as p95_matrix_profile
| eval anomaly=if(matrix_profile>p95_matrix_profile,1,0)
| fields - p95_matrix_profile
now want to run this command for my data, here is the sample log:
2022-11-30 23:59:00,122,124
2022-11-30 23:58:00,113,112
2022-11-30 23:57:00,144,143
2022-11-30 23:56:00,137,138
2022-11-30 23:55:00,119,120
2022-11-30 23:54:00,103,102
2022-11-30 23:53:00,104,105
2022-11-30 23:52:00,143,142
2022-11-30 23:51:00,138,139
2022-11-30 23:50:00,155,153
2022-11-30 23:49:00,100,102
timestamp: 2022-11-30 23:59:00
logons: 122
here is the spl that i run:
| rex field=_raw "(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(?<logons>\d+)"
| eval _time=strptime(time, "%Y-%m-%d %H:%M:%S")
| timechart span=15m avg(logons) as logons
| fit MLTKContainer algo=stumpy m=96 logons from _time into app:stumpy_anomalies
| table _time logons matrix_profile
| eventstats p95(matrix_profile) as p95_matrix_profile
| eval anomaly=if(matrix_profile>p95_matrix_profile,1,0)
| fields - p95_matrix_profile
before fit command _time show correctly, but after fit command it's empty!
FYI: logon, matrix_profile, anomaly return correctly but _time is empty!
Any idea?
Hey there,
Results of the | fit command are affected by the time range picker. Once you set the time range to all time, _time is displayed normally.
Edit: I looked into the interaction between inputlookup + fit + time range picker. As documented here, the result of the fit command are appended to the initial dataset. In this case, the expected outcome would be that the resulting table includes only rows that are covered by the time range picker. However, the following happens:
Time range picker: All time
Resulting table: Initial dataset + output of fit command
Result: OK, expected result
Time range picker: Some time before the first observation - now
Resulting table: Initial dataset + output of fit command
Result: OK, expected result (Warning: The specified span would result in too many (>50000) rows.)
Time range picker: About halfway through the dataset timestamps - now
Resulting table: Initial dataset + output of fit command
Result: OK, unexpected result (Warning: The specified span would result in too many (>50000) rows.)
Time range picker: After some time of the last observation - now
Resulting table: Initial dataset + output of fit command
Result: OK, unexpected result (Warning: The specified span would result in too many (>50000) rows.)
Time range picker: Some time before the first observation - some time stamp after the last observation
Resulting table: output of fit command
Result: NOT OK, unexpected result
I checked the sources that were available to me (search.log, .py files) but sadly this did not suffice to reverse engineer how the initial dataset and the output of the fit command are merged and filtered. It seems that earliest has no effect, but once latest is set to a timestamp, the behavior becomes unexpected.
@pdrieger_splunkany idea?