Splunk Search

issue with "_time" after using fit command in DLTK

indeed_2000
Motivator

Hi

here is the default spl of App: Splunk App for Data Science and Deep Learning (Time Series Anomalies with STUMPY -Time Series Anomaly Detection with Matrix Profiles)

| inputlookup cyclical_business_process.csv
| eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S")
| timechart span=15m avg(logons) as logons
| fit MLTKContainer algo=stumpy m=96 logons from _time into app:stumpy_anomalies
| table _time logons matrix_profile
| eventstats p95(matrix_profile) as p95_matrix_profile
| eval anomaly=if(matrix_profile>p95_matrix_profile,1,0)
| fields - p95_matrix_profile

 

 

now want to run this command for my data, here is the sample log:

2022-11-30 23:59:00,122,124
2022-11-30 23:58:00,113,112
2022-11-30 23:57:00,144,143
2022-11-30 23:56:00,137,138
2022-11-30 23:55:00,119,120
2022-11-30 23:54:00,103,102
2022-11-30 23:53:00,104,105
2022-11-30 23:52:00,143,142
2022-11-30 23:51:00,138,139
2022-11-30 23:50:00,155,153
2022-11-30 23:49:00,100,102

 

timestamp: 2022-11-30 23:59:00

logons: 122

 

here is the spl that i run:
| rex field=_raw "(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(?<logons>\d+)"
| eval _time=strptime(time, "%Y-%m-%d %H:%M:%S")
| timechart span=15m avg(logons) as logons
| fit MLTKContainer algo=stumpy m=96 logons from _time into app:stumpy_anomalies
| table _time logons matrix_profile
| eventstats p95(matrix_profile) as p95_matrix_profile
| eval anomaly=if(matrix_profile>p95_matrix_profile,1,0)
| fields - p95_matrix_profile

 

before fit command _time show correctly, but after fit command it's empty!

FYI: logon, matrix_profile, anomaly return correctly but _time is empty!

 

Any  idea?

Labels (3)

Gabriel
Path Finder

Hey there,

Results of the | fit command are affected by the time range picker.  Once you set the time range to all time, _time is displayed normally.

 

Edit: I looked into the interaction between inputlookup + fit + time range picker. As documented here, the result of the fit command are appended to the initial dataset. In this case, the expected outcome would be that the resulting table includes only rows that are covered by the time range picker. However, the following happens:

Time range picker: All time
Resulting table: Initial dataset + output of fit command
Result: OK, expected result

Time range picker: Some time before the first observation - now
Resulting table: Initial dataset + output of fit command
Result: OK, expected result (Warning: The specified span would result in too many (>50000) rows.)

Time range picker: About halfway through the dataset timestamps - now
Resulting table: Initial dataset + output of fit command
Result: OK, unexpected result (Warning: The specified span would result in too many (>50000) rows.)

Time range picker: After some time of the last observation - now
Resulting table: Initial dataset + output of fit command
Result: OK, unexpected result (Warning: The specified span would result in too many (>50000) rows.)

Time range picker: Some time before the first observation - some time stamp after the last observation
Resulting table: output of fit command
Result: NOT OK, unexpected result

I checked the sources that were available to me (search.log, .py files) but sadly this did not suffice to reverse engineer how the initial dataset and the output of the fit command are merged and filtered. It seems that earliest has no effect, but once latest is set to a timestamp, the behavior becomes unexpected.

0 Karma

indeed_2000
Motivator

@pdrieger_splunkany idea?

 

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...