<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection? in All Apps and Add-ons</title>
    <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409045#M49921</link>
    <description>&lt;P&gt;I will edit the answer so it reflects how you could replace the makeresults with a lookup or a search.&lt;/P&gt;</description>
    <pubDate>Thu, 01 Aug 2019 17:31:57 GMT</pubDate>
    <dc:creator>jaime_ramirez</dc:creator>
    <dc:date>2019-08-01T17:31:57Z</dc:date>
    <item>
      <title>Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409035#M49911</link>
      <description>&lt;P&gt;I want to run a daily alert to check for outliers in host crashes via the MLTK time series forecast algorithm; however, the syntax is not optimal for forecasting multiple hosts, so I have an initial filter which shows the list of hosts if the amount of crashes is higher than average. I want to take this output and then, for each host in the list, run the outlier detection as follows:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| timechart span=1d sum(VOLUME) 
| predict "sum(VOLUME)" as prediction algorithm="LLP5" future_timespan="30" holdback="14" period=7 lower"95"=lower"95" upper"95"=upper"95" 
| eval isOutlier = if(prediction!="" AND 'sum(VOLUME)' !="" AND ('sum(VOLUME)' &amp;lt; 'lower95(prediction)' OR 'sum(VOLUME)' &amp;gt; 'upper95(prediction)'), 1, 0) 
| where isOutlier=1 
| fields - isOutlier
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;But I'm not sure the best way to go about this. I know I can output the results from my initial filtering search to a lookup and then have separate queries that say "for the host from row 1, run outlier detection," and then "for host from row 2, run outlier detection," etc. but this would require separate alert queries for however many rows I would want to include.  What I would really like is a query that iterates through the results of my initial filter, and then for each row, grab the host and run the outlier detection. Is there a way to run a loop like this?&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jul 2019 13:29:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409035#M49911</guid>
      <dc:creator>TylerJVitale</dc:creator>
      <dc:date>2019-07-29T13:29:22Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409036#M49912</link>
      <description>&lt;P&gt;Hi @TylerJVitale . Have you tried the &lt;STRONG&gt;map&lt;/STRONG&gt; command? Although not sure if its optimal to use it in conjunction with the predict command.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Map"&gt;https://docs.splunk.com/Documentation/Splunk/7.3.0/SearchReference/Map&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jul 2019 22:51:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409036#M49912</guid>
      <dc:creator>jaime_ramirez</dc:creator>
      <dc:date>2019-07-29T22:51:54Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409037#M49913</link>
      <description>&lt;P&gt;Confidence interval has nothing to do with being an outlier or not.Please do not use forecasting for finding your outliers. I would suggest you to go through this blog and look into the new algorithm we have in MLTK: &lt;A href="https://www.splunk.com/blog/2019/03/20/what-s-new-in-the-splunk-machine-learning-toolkit-4-2.html"&gt;https://www.splunk.com/blog/2019/03/20/what-s-new-in-the-splunk-machine-learning-toolkit-4-2.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Jul 2019 23:42:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409037#M49913</guid>
      <dc:creator>grana_splunk</dc:creator>
      <dc:date>2019-07-30T23:42:31Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409038#M49914</link>
      <description>&lt;P&gt;The forecasting algorithm in MLTK has an outlier panel, so why shouldn't I use it? It does exactly what I want it to, creating a model that accounts for seasonality and trend and then constructing a CI around that. If the number of crashes falls outside that CI, I would like to be alerted. Why is this not okay?&lt;/P&gt;

&lt;P&gt;As for the new MLTK, we're not up to date on it and I'm not sure if/when we will upgrade, so this will have to do for now&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2019 11:49:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409038#M49914</guid>
      <dc:creator>TylerJVitale</dc:creator>
      <dc:date>2019-07-31T11:49:11Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409039#M49915</link>
      <description>&lt;P&gt;This seems like it could work. I'm just having difficulty figuring out how to configure it. At the end of my initial query, I have a table with host avg VOLUME. I want to run the timechart and prediction for each host, but even just tacking on something like &lt;CODE&gt;|map search="search index=index sourcetype="sourcetype" host="$host$"&lt;BR /&gt;
| timechart span=1h sum(VOLUME)"&lt;/CODE&gt; gives me no results, so I'm not sure where the issue is or how to fix it. My best guess is it's something with the search ID field.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2019 12:14:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409039#M49915</guid>
      <dc:creator>TylerJVitale</dc:creator>
      <dc:date>2019-07-31T12:14:49Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409040#M49916</link>
      <description>&lt;P&gt;Ok, I will try to make a test and try to have an answer; meanwhile, for outlier detection you could read the following:&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.3.0/Search/Findingandremovingoutliers"&gt;https://docs.splunk.com/Documentation/Splunk/7.3.0/Search/Findingandremovingoutliers&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2019 16:42:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409040#M49916</guid>
      <dc:creator>jaime_ramirez</dc:creator>
      <dc:date>2019-07-31T16:42:48Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409041#M49917</link>
      <description>&lt;P&gt;Hi @TylerJVitale, &lt;/P&gt;

&lt;P&gt;So far this works:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults 
| eval hosts_predict=split("host1,host2,host3,host4,host5", ",")
| mvexpand hosts_predict
| map maxsearches=5 search="search index=\"index_to_search_in\" latest=\"-0d@d\" host=\"$hosts_predict$\" | table _time host VOLUME | bin _time span=1d | stats sum(VOLUME) as sum_VOLUME by _time host | predict sum_VOLUME as prediction algorithm=\"LLP5\" future_timespan=\"30\" holdback=\"14\" period=7 lower\"95\"=lower\"95\" upper\"95\"=upper\"95\" | filldown host"
| eval isOutlier=if(sum_VOLUME &amp;lt; 'lower95(prediction)' OR sum_VOLUME &amp;gt; 'upper95(prediction)', 1, 0)
| where isOutlier=1
| fields - isOutlier
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;A little explaining:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;In the &lt;STRONG&gt;makeresults&lt;/STRONG&gt; command you are selecting the hosts that you would like to run the &lt;STRONG&gt;predict&lt;/STRONG&gt; command to. This could either be a lookup or a list of hosts that results from another search.&lt;/LI&gt;
&lt;LI&gt;The &lt;STRONG&gt;map&lt;/STRONG&gt; command then takes each row as input to feed the predict search. This search is where the data to feed the analysis comes from; it just changes the host that it is applied to. &lt;STRONG&gt;Highly recommended to be fed with some form of summary or acelerated data&lt;/STRONG&gt; since; depending on your setup, this could take very long and consume a lot of resources.&lt;/LI&gt;
&lt;LI&gt;The last &lt;STRONG&gt;eval&lt;/STRONG&gt; command is the Outlier detection from your original search.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;I agree with @grana_splunk that is highly recommended to evaluate another way to accomplish the outlier detection logic. Here I present you with several alternatives:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Run a report to generate a &lt;STRONG&gt;lookup&lt;/STRONG&gt; which contains per-day-basis-threshold (customizable) per host and compare it with your current data. This highly reduces the overhead of performing the predict command in an alert since it is a simple lookup operation. (I have applied this in many scenarios and works great)&lt;/LI&gt;
&lt;LI&gt;As mentioned by @grana_splunk, use the &lt;STRONG&gt;DensityFunction&lt;/STRONG&gt; algorithm in the &lt;STRONG&gt;MLTK 4.2&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;For algorithms to detect outliers you could use IQR (Interquartile range) or Standard Deviation. You should check how Splunk ITSI applies some of this procedures to generate &lt;STRONG&gt;Adaptive Thresholds&lt;/STRONG&gt;. &lt;A href="https://www.splunk.com/blog/2018/01/16/ensuring-success-with-itsi-threshold-and-alert-configurations-part-2-adaptive-thresholding.html"&gt;https://www.splunk.com/blog/2018/01/16/ensuring-success-with-itsi-threshold-and-alert-configurations-part-2-adaptive-thresholding.html&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;&lt;STRONG&gt;To replace the makeresults you could do the following:&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=\"index_to_search_in\" 
| table host
| dedup host
| rename host as hosts_predict
| map ...
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;With a &lt;STRONG&gt;lookup&lt;/STRONG&gt;:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup list_of_hosts.csv 
| field host
| rename host as hosts_predict
| map ...
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hope it helps&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2019 18:49:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409041#M49917</guid>
      <dc:creator>jaime_ramirez</dc:creator>
      <dc:date>2019-07-31T18:49:08Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409042#M49918</link>
      <description>&lt;P&gt;This might work. I would have loved to use the DensityFunction but we don't have the MLTK 4.2, and IQR or StandardDeviation won't work because they can't filter seasonality and trend. &lt;/P&gt;

&lt;P&gt;Few questions:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;In the makeresults pipe, if I'm using the contents of a lookup, how would I write that (say for example I'm outputting the results of a scheduled report to "mylookup"?&lt;/LI&gt;
&lt;LI&gt;In the split command, where you have host1, host2, etc., are those just stand-ins for the actual host names? I want the list of hosts to be dynamic based on another search I have (which uses standard deviation to narrow the list of potential outliers), so how can I account for that?&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
Tyler&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2019 11:32:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409042#M49918</guid>
      <dc:creator>TylerJVitale</dc:creator>
      <dc:date>2019-08-01T11:32:47Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409043#M49919</link>
      <description>&lt;P&gt;Also, the predict command requires a preceding timechart, at least in my version of the MLTK. And then with timechart it gets all messy if you try to predict by host&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2019 12:10:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409043#M49919</guid>
      <dc:creator>TylerJVitale</dc:creator>
      <dc:date>2019-08-01T12:10:42Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409044#M49920</link>
      <description>&lt;P&gt;You could replace the stats with chart or with timechart before the predict command specifying the span=1d as follows:&lt;/P&gt;

&lt;P&gt;With &lt;STRONG&gt;chart&lt;/STRONG&gt;:&lt;BR /&gt;
    ...&lt;BR /&gt;
    | map maxsearches=5 search="search index=\"index_to_search_in\" latest=\"-0d@d\" host=\"$hosts_predict$\" | table _time host VOLUME | bin _time span=1d | chart sum(VOLUME) as sum_VOLUME  last(host) as host by _time | predict sum_VOLUME as prediction algorithm=\"LLP5\" future_timespan=\"30\" holdback=\"14\" period=7 lower\"95\"=lower\"95\" upper\"95\"=upper\"95\" | filldown host"&lt;BR /&gt;
    ...&lt;/P&gt;

&lt;P&gt;OR &lt;STRONG&gt;timechart&lt;/STRONG&gt;:&lt;BR /&gt;
    ...&lt;BR /&gt;
    | map maxsearches=5 search="search index=\"index_to_search_in\" latest=\"-0d@d\" host=\"$hosts_predict$\" | table _time host VOLUME | timechart sum(VOLUME) as sum_VOLUME last(host) as host span=1d | predict sum_VOLUME as prediction algorithm=\"LLP5\" future_timespan=\"30\" holdback=\"14\" period=7 lower\"95\"=lower\"95\" upper\"95\"=upper\"95\" | filldown host"&lt;BR /&gt;
    ...&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 01:34:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409044#M49920</guid>
      <dc:creator>jaime_ramirez</dc:creator>
      <dc:date>2020-09-30T01:34:36Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning Toolkit: How to iterate through a result set and then run an outlier detection?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409045#M49921</link>
      <description>&lt;P&gt;I will edit the answer so it reflects how you could replace the makeresults with a lookup or a search.&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2019 17:31:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-Toolkit-How-to-iterate-through-a-result/m-p/409045#M49921</guid>
      <dc:creator>jaime_ramirez</dc:creator>
      <dc:date>2019-08-01T17:31:57Z</dc:date>
    </item>
  </channel>
</rss>

