Machine learning outliers

rrovers · ‎08-17-2020

I want to use the machine learning toolkit to detect outliers.

I've made a query with earliest=-2mon@mon latest=@mon to let splunk determine the values for outliers for that period. I want to run the search every day and let the alert send an email when a new outlier is detected since the last run.

I can't find out how to do this. Every time the search runs it detects all outliers of the last 2 months.

to4kawa · ‎08-17-2020

https://www.splunk.com/en_us/blog/machine-learning/cyclical-statistical-forecasts-and-anomalies-part...

rrovers · ‎08-17-2020

Thank you for the information. Your answer is quite extensive and probably usefull to learn more about machine learning.

The thing I want to know is, is if it is possible to let machine learning determine the lowerbound and upperbound for a long period (for example 2 months or may be even 1 year) and run the search every day as an alert that only gives me the new (since the last day) outliers.

to4kawa · ‎08-17-2020

https://docs.splunk.com/Documentation/MLApp/5.2.0/User/DNOlegacyassist

I don't think we need to get too hung up on machine learning.

rrovers · ‎08-18-2020

The functionality seems the same as I used for my alert. The result is not what I'm looking for.

Let's try to clearify it with an example.

In the machine learning app I created an experiment with this simple search to use for this example

index=_internal sourcetype="splunkd_remote_searches" earliest=-1w@w latest=now
| eval day=strftime(_time,"%Y-%m-%d")
| stats count by day

1 outlier is detected.

After saving this alert I have created an alert from the overview screen (manage - create alert).

My goal is to use the period since last week to determine lower- and upperbound but only receive an alert when there are new outliers since the last run. But now every run over last week the same outlier is detected.

I wonder if it is possible what I want.

to4kawa · ‎08-18-2020

use outputcsv and make query with the csv.

rrovers · ‎08-19-2020

Hi, thanks for your answer. I gave it a try but missing some information.

Can you please explain a bit more?

rrovers · ‎08-19-2020

I think I solved it by joining 2 searches.

The first 1 to determine the lowerbound and the upperbound over a long period (last 2 months)

the second 1 to check whether the count of the events of the last day is less than the lowerbound or more than the upperbound I determined in the first one.

2savage · ‎10-08-2020

Aloha @rrovers ,

I think what @to4kawa is saying you should do is to create a lookup file and output your results to that lookup. For example, you would output _time, machinename, and whatever field you believe is valuable using | outputcsv. In turn, you can query the lookup file each day to remove the previous days' outliers.

I am also going to be experimenting with machine learning and looking to build profiles for users and computers, probably with login patterns at first. Using outputcsv would be a good way to keep track of results, though I'm sure there are other ways to do it.

rrovers · ‎10-09-2020

Hi @2savage,

I solved it by collecting the daily results to a summary index (I prefer summary indexes above lookups for this kind of functionality).

I don't think this is part of machine learning but it works fine for me.

Machine learning outliers

alert condition

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation

Machine learning outliers

alert condition

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability