Splunk Search

issue with "_time" after using fit command in DLTK

indeed_2000
Motivator

Hi

here is the default spl of App: Splunk App for Data Science and Deep Learning (Time Series Anomalies with STUMPY -Time Series Anomaly Detection with Matrix Profiles)

| inputlookup cyclical_business_process.csv
| eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S")
| timechart span=15m avg(logons) as logons
| fit MLTKContainer algo=stumpy m=96 logons from _time into app:stumpy_anomalies
| table _time logons matrix_profile
| eventstats p95(matrix_profile) as p95_matrix_profile
| eval anomaly=if(matrix_profile>p95_matrix_profile,1,0)
| fields - p95_matrix_profile

 

 

now want to run this command for my data, here is the sample log:

2022-11-30 23:59:00,122,124
2022-11-30 23:58:00,113,112
2022-11-30 23:57:00,144,143
2022-11-30 23:56:00,137,138
2022-11-30 23:55:00,119,120
2022-11-30 23:54:00,103,102
2022-11-30 23:53:00,104,105
2022-11-30 23:52:00,143,142
2022-11-30 23:51:00,138,139
2022-11-30 23:50:00,155,153
2022-11-30 23:49:00,100,102

 

timestamp: 2022-11-30 23:59:00

logons: 122

 

here is the spl that i run:
| rex field=_raw "(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(?<logons>\d+)"
| eval _time=strptime(time, "%Y-%m-%d %H:%M:%S")
| timechart span=15m avg(logons) as logons
| fit MLTKContainer algo=stumpy m=96 logons from _time into app:stumpy_anomalies
| table _time logons matrix_profile
| eventstats p95(matrix_profile) as p95_matrix_profile
| eval anomaly=if(matrix_profile>p95_matrix_profile,1,0)
| fields - p95_matrix_profile

 

before fit command _time show correctly, but after fit command it's empty!

FYI: logon, matrix_profile, anomaly return correctly but _time is empty!

 

Any  idea?

Labels (3)

Gabriel
Path Finder

Hey there,

Results of the | fit command are affected by the time range picker.  Once you set the time range to all time, _time is displayed normally.

 

Edit: I looked into the interaction between inputlookup + fit + time range picker. As documented here, the result of the fit command are appended to the initial dataset. In this case, the expected outcome would be that the resulting table includes only rows that are covered by the time range picker. However, the following happens:

Time range picker: All time
Resulting table: Initial dataset + output of fit command
Result: OK, expected result

Time range picker: Some time before the first observation - now
Resulting table: Initial dataset + output of fit command
Result: OK, expected result (Warning: The specified span would result in too many (>50000) rows.)

Time range picker: About halfway through the dataset timestamps - now
Resulting table: Initial dataset + output of fit command
Result: OK, unexpected result (Warning: The specified span would result in too many (>50000) rows.)

Time range picker: After some time of the last observation - now
Resulting table: Initial dataset + output of fit command
Result: OK, unexpected result (Warning: The specified span would result in too many (>50000) rows.)

Time range picker: Some time before the first observation - some time stamp after the last observation
Resulting table: output of fit command
Result: NOT OK, unexpected result

I checked the sources that were available to me (search.log, .py files) but sadly this did not suffice to reverse engineer how the initial dataset and the output of the fit command are merged and filtered. It seems that earliest has no effect, but once latest is set to a timestamp, the behavior becomes unexpected.

0 Karma

indeed_2000
Motivator

@pdrieger_splunkany idea?

 

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...