Solved: Splunk Machine Learning Toolkit: How to iterate th...

TylerJVitale · ‎07-29-2019

I want to run a daily alert to check for outliers in host crashes via the MLTK time series forecast algorithm; however, the syntax is not optimal for forecasting multiple hosts, so I have an initial filter which shows the list of hosts if the amount of crashes is higher than average. I want to take this output and then, for each host in the list, run the outlier detection as follows:

| timechart span=1d sum(VOLUME) 
| predict "sum(VOLUME)" as prediction algorithm="LLP5" future_timespan="30" holdback="14" period=7 lower"95"=lower"95" upper"95"=upper"95" 
| eval isOutlier = if(prediction!="" AND 'sum(VOLUME)' !="" AND ('sum(VOLUME)' < 'lower95(prediction)' OR 'sum(VOLUME)' > 'upper95(prediction)'), 1, 0) 
| where isOutlier=1 
| fields - isOutlier

But I'm not sure the best way to go about this. I know I can output the results from my initial filtering search to a lookup and then have separate queries that say "for the host from row 1, run outlier detection," and then "for host from row 2, run outlier detection," etc. but this would require separate alert queries for however many rows I would want to include. What I would really like is a query that iterates through the results of my initial filter, and then for each row, grab the host and run the outlier detection. Is there a way to run a loop like this?

jaime_ramirez · ‎07-31-2019

Hi @TylerJVitale,

So far this works:

| makeresults 
| eval hosts_predict=split("host1,host2,host3,host4,host5", ",")
| mvexpand hosts_predict
| map maxsearches=5 search="search index=\"index_to_search_in\" latest=\"-0d@d\" host=\"$hosts_predict$\" | table _time host VOLUME | bin _time span=1d | stats sum(VOLUME) as sum_VOLUME by _time host | predict sum_VOLUME as prediction algorithm=\"LLP5\" future_timespan=\"30\" holdback=\"14\" period=7 lower\"95\"=lower\"95\" upper\"95\"=upper\"95\" | filldown host"
| eval isOutlier=if(sum_VOLUME < 'lower95(prediction)' OR sum_VOLUME > 'upper95(prediction)', 1, 0)
| where isOutlier=1
| fields - isOutlier

A little explaining:

In the makeresults command you are selecting the hosts that you would like to run the predict command to. This could either be a lookup or a list of hosts that results from another search.
The map command then takes each row as input to feed the predict search. This search is where the data to feed the analysis comes from; it just changes the host that it is applied to. Highly recommended to be fed with some form of summary or acelerated data since; depending on your setup, this could take very long and consume a lot of resources.
The last eval command is the Outlier detection from your original search.

I agree with @grana_splunk that is highly recommended to evaluate another way to accomplish the outlier detection logic. Here I present you with several alternatives:

Run a report to generate a lookup which contains per-day-basis-threshold (customizable) per host and compare it with your current data. This highly reduces the overhead of performing the predict command in an alert since it is a simple lookup operation. (I have applied this in many scenarios and works great)
As mentioned by @grana_splunk, use the DensityFunction algorithm in the MLTK 4.2.
For algorithms to detect outliers you could use IQR (Interquartile range) or Standard Deviation. You should check how Splunk ITSI applies some of this procedures to generate Adaptive Thresholds. https://www.splunk.com/blog/2018/01/16/ensuring-success-with-itsi-threshold-and-alert-configurations...

To replace the makeresults you could do the following:

index=\"index_to_search_in\" 
| table host
| dedup host
| rename host as hosts_predict
| map ...

With a lookup:

| inputlookup list_of_hosts.csv 
| field host
| rename host as hosts_predict
| map ...

Hope it helps

View solution in original post

jaime_ramirez · ‎07-31-2019

Hi @TylerJVitale,

So far this works:

| makeresults 
| eval hosts_predict=split("host1,host2,host3,host4,host5", ",")
| mvexpand hosts_predict
| map maxsearches=5 search="search index=\"index_to_search_in\" latest=\"-0d@d\" host=\"$hosts_predict$\" | table _time host VOLUME | bin _time span=1d | stats sum(VOLUME) as sum_VOLUME by _time host | predict sum_VOLUME as prediction algorithm=\"LLP5\" future_timespan=\"30\" holdback=\"14\" period=7 lower\"95\"=lower\"95\" upper\"95\"=upper\"95\" | filldown host"
| eval isOutlier=if(sum_VOLUME < 'lower95(prediction)' OR sum_VOLUME > 'upper95(prediction)', 1, 0)
| where isOutlier=1
| fields - isOutlier

A little explaining:

In the makeresults command you are selecting the hosts that you would like to run the predict command to. This could either be a lookup or a list of hosts that results from another search.
The map command then takes each row as input to feed the predict search. This search is where the data to feed the analysis comes from; it just changes the host that it is applied to. Highly recommended to be fed with some form of summary or acelerated data since; depending on your setup, this could take very long and consume a lot of resources.
The last eval command is the Outlier detection from your original search.

I agree with @grana_splunk that is highly recommended to evaluate another way to accomplish the outlier detection logic. Here I present you with several alternatives:

Run a report to generate a lookup which contains per-day-basis-threshold (customizable) per host and compare it with your current data. This highly reduces the overhead of performing the predict command in an alert since it is a simple lookup operation. (I have applied this in many scenarios and works great)
As mentioned by @grana_splunk, use the DensityFunction algorithm in the MLTK 4.2.
For algorithms to detect outliers you could use IQR (Interquartile range) or Standard Deviation. You should check how Splunk ITSI applies some of this procedures to generate Adaptive Thresholds. https://www.splunk.com/blog/2018/01/16/ensuring-success-with-itsi-threshold-and-alert-configurations...

To replace the makeresults you could do the following:

index=\"index_to_search_in\" 
| table host
| dedup host
| rename host as hosts_predict
| map ...

With a lookup:

| inputlookup list_of_hosts.csv 
| field host
| rename host as hosts_predict
| map ...

Hope it helps

TylerJVitale · ‎08-01-2019

This might work. I would have loved to use the DensityFunction but we don't have the MLTK 4.2, and IQR or StandardDeviation won't work because they can't filter seasonality and trend.

Few questions:

In the makeresults pipe, if I'm using the contents of a lookup, how would I write that (say for example I'm outputting the results of a scheduled report to "mylookup"?
In the split command, where you have host1, host2, etc., are those just stand-ins for the actual host names? I want the list of hosts to be dynamic based on another search I have (which uses standard deviation to narrow the list of potential outliers), so how can I account for that?

Thanks,
Tyler

jaime_ramirez · ‎08-01-2019

I will edit the answer so it reflects how you could replace the makeresults with a lookup or a search.

TylerJVitale · ‎08-01-2019

Also, the predict command requires a preceding timechart, at least in my version of the MLTK. And then with timechart it gets all messy if you try to predict by host

jaime_ramirez · ‎08-01-2019

You could replace the stats with chart or with timechart before the predict command specifying the span=1d as follows:

With chart:
...
| map maxsearches=5 search="search index=\"index_to_search_in\" latest=\"-0d@d\" host=\"$hosts_predict$\" | table _time host VOLUME | bin _time span=1d | chart sum(VOLUME) as sum_VOLUME last(host) as host by _time | predict sum_VOLUME as prediction algorithm=\"LLP5\" future_timespan=\"30\" holdback=\"14\" period=7 lower\"95\"=lower\"95\" upper\"95\"=upper\"95\" | filldown host"
...

OR timechart:
...
| map maxsearches=5 search="search index=\"index_to_search_in\" latest=\"-0d@d\" host=\"$hosts_predict$\" | table _time host VOLUME | timechart sum(VOLUME) as sum_VOLUME last(host) as host span=1d | predict sum_VOLUME as prediction algorithm=\"LLP5\" future_timespan=\"30\" holdback=\"14\" period=7 lower\"95\"=lower\"95\" upper\"95\"=upper\"95\" | filldown host"
...

grana_splunk · ‎07-30-2019

Confidence interval has nothing to do with being an outlier or not.Please do not use forecasting for finding your outliers. I would suggest you to go through this blog and look into the new algorithm we have in MLTK: https://www.splunk.com/blog/2019/03/20/what-s-new-in-the-splunk-machine-learning-toolkit-4-2.html

TylerJVitale · ‎07-31-2019

The forecasting algorithm in MLTK has an outlier panel, so why shouldn't I use it? It does exactly what I want it to, creating a model that accounts for seasonality and trend and then constructing a CI around that. If the number of crashes falls outside that CI, I would like to be alerted. Why is this not okay?

As for the new MLTK, we're not up to date on it and I'm not sure if/when we will upgrade, so this will have to do for now

jaime_ramirez · ‎07-29-2019

Hi @TylerJVitale . Have you tried the map command? Although not sure if its optimal to use it in conjunction with the predict command.

https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Map

TylerJVitale · ‎07-31-2019

This seems like it could work. I'm just having difficulty figuring out how to configure it. At the end of my initial query, I have a table with host avg VOLUME. I want to run the timechart and prediction for each host, but even just tacking on something like |map search="search index=index sourcetype="sourcetype" host="$host$" | timechart span=1h sum(VOLUME)" gives me no results, so I'm not sure where the issue is or how to fix it. My best guess is it's something with the search ID field.

jaime_ramirez · ‎07-31-2019

Ok, I will try to make a test and try to have an answer; meanwhile, for outlier detection you could read the following:
https://docs.splunk.com/Documentation/Splunk/7.3.0/Search/Findingandremovingoutliers

Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Application management with Targeted Application Install for Victoria Experience

Join the Conversation

Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Application management with Targeted Application Install for Victoria Experience