<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Machine Learning Toolkit: What is a good practical example to use this with alerting in a system with too many records for an All Time search? in All Apps and Add-ons</title>
    <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Machine-Learning-Toolkit-What-is-a-good-practical-example-to-use/m-p/241048#M27273</link>
    <description>&lt;P&gt;I think you are asking for the following? &lt;/P&gt;

&lt;P&gt;The Numeric Outlier Detector, as of 2.1, does not store a model file for the statistical observations used to score anomalies. You can store the values from a previous search (a priori)  by using | outputlookup MyNewModelFile.csv and then | inputlookup in a real time search to enrich the new events with the observed values.&lt;/P&gt;

&lt;P&gt;Using your specific example, I would &lt;BR /&gt;
1) click the open in search button , copying the search to a new splunk window for editing&lt;BR /&gt;
2) edit to look like&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup hostperf.csv | eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S.%3Q%z") | timechart span=10m max(rtmax) as responsetime | head 1000
| eventstats median("responsetime") as median 
| eval absDev=(abs('responsetime'-median))
| eventstats median(absDev) as medianAbsDev
| outputlookup MySavedModel.csv 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;2)  the  &lt;CODE&gt;| outputlookup MySavedModel.csv&lt;/CODE&gt; , perhaps remove the |head 1000 section as needed. The point of MySavedModel.csv is you are storing the anomaly levels as detected by every scheduled run over a long time window and then scoring alerts based on a smaller time window  "real time" search.&lt;BR /&gt;
3) save the search as a report that runs in the background and updates itself every N minutes as needed&lt;BR /&gt;
4) use inputlookup  (with a time join) for MySavedModel.csv to enrich the new data as it comes in and make a eval/alert. the new search will use the &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| eval isOutlier=if('responsetime' &amp;lt; lowerBound OR 'responsetime' &amp;gt; upperBound, 1, 0)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;using the old lowerBound,upperBound from the observed time and you can save the alert as the isOutlier bit.&lt;/P&gt;</description>
    <pubDate>Mon, 27 Mar 2017 10:55:21 GMT</pubDate>
    <dc:creator>astein_splunk</dc:creator>
    <dc:date>2017-03-27T10:55:21Z</dc:date>
    <item>
      <title>Machine Learning Toolkit: What is a good practical example to use this with alerting in a system with too many records for an All Time search?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Machine-Learning-Toolkit-What-is-a-good-practical-example-to-use/m-p/241047#M27272</link>
      <description>&lt;P&gt;I have downloaded the Machine Learning Toolkit and Showcase app and tested out the response time example. Looks cool!&lt;/P&gt;

&lt;P&gt;I like that you are shown the exact Splunk code you could use to identify outliers.&lt;/P&gt;

&lt;P&gt;The example shown is an "All time" search and then head 1000 to get the latest 1000 entries.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup hostperf.csv | eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S.%3Q%z") | timechart span=10m max(rtmax) as responsetime | head 1000
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And then you can use responsetime to find the outliers.&lt;/P&gt;

&lt;P&gt;In practice, how would we use this in a system where we have too many records for an All Time search? I am looking for good practical examples how one might use this library for alerting. Somehow you want to use your recent data to compare with a search over a longer period of time. &lt;/P&gt;</description>
      <pubDate>Thu, 30 Jun 2016 02:00:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Machine-Learning-Toolkit-What-is-a-good-practical-example-to-use/m-p/241047#M27272</guid>
      <dc:creator>burwell</dc:creator>
      <dc:date>2016-06-30T02:00:26Z</dc:date>
    </item>
    <item>
      <title>Re: Machine Learning Toolkit: What is a good practical example to use this with alerting in a system with too many records for an All Time search?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Machine-Learning-Toolkit-What-is-a-good-practical-example-to-use/m-p/241048#M27273</link>
      <description>&lt;P&gt;I think you are asking for the following? &lt;/P&gt;

&lt;P&gt;The Numeric Outlier Detector, as of 2.1, does not store a model file for the statistical observations used to score anomalies. You can store the values from a previous search (a priori)  by using | outputlookup MyNewModelFile.csv and then | inputlookup in a real time search to enrich the new events with the observed values.&lt;/P&gt;

&lt;P&gt;Using your specific example, I would &lt;BR /&gt;
1) click the open in search button , copying the search to a new splunk window for editing&lt;BR /&gt;
2) edit to look like&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup hostperf.csv | eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S.%3Q%z") | timechart span=10m max(rtmax) as responsetime | head 1000
| eventstats median("responsetime") as median 
| eval absDev=(abs('responsetime'-median))
| eventstats median(absDev) as medianAbsDev
| outputlookup MySavedModel.csv 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;2)  the  &lt;CODE&gt;| outputlookup MySavedModel.csv&lt;/CODE&gt; , perhaps remove the |head 1000 section as needed. The point of MySavedModel.csv is you are storing the anomaly levels as detected by every scheduled run over a long time window and then scoring alerts based on a smaller time window  "real time" search.&lt;BR /&gt;
3) save the search as a report that runs in the background and updates itself every N minutes as needed&lt;BR /&gt;
4) use inputlookup  (with a time join) for MySavedModel.csv to enrich the new data as it comes in and make a eval/alert. the new search will use the &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| eval isOutlier=if('responsetime' &amp;lt; lowerBound OR 'responsetime' &amp;gt; upperBound, 1, 0)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;using the old lowerBound,upperBound from the observed time and you can save the alert as the isOutlier bit.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Mar 2017 10:55:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Machine-Learning-Toolkit-What-is-a-good-practical-example-to-use/m-p/241048#M27273</guid>
      <dc:creator>astein_splunk</dc:creator>
      <dc:date>2017-03-27T10:55:21Z</dc:date>
    </item>
  </channel>
</rss>

