<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Discrepancies between MLTK and Splunk App for Anomaly Detection in Dashboards &amp; Visualizations</title>
    <link>https://community.splunk.com/t5/Dashboards-Visualizations/Discrepancies-between-MLTK-and-Splunk-App-for-Anomaly-Detection/m-p/658532#M54252</link>
    <description>&lt;P&gt;With MLTK, when looking at accumulated runtime, the outliers are detected cleanly (three out of three spikes), whereas with the anomaly detection app, only two of the three spikes are detected (along with one false positive, even at medium sensitivity).&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="mltk_median_runtime_comparison.PNG" style="width: 999px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/27302i32F83E17B19D022B/image-size/large?v=v2&amp;amp;px=999" role="button" title="mltk_median_runtime_comparison.PNG" alt="mltk_median_runtime_comparison.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The code generated by the MLTK is as follows -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=_audit host=XXXXXXXX action=search info=completed 
| table _time host total_run_time savedsearch_name 
| eval total_run_time_mins=total_run_time/60 
| convert ctime(search_*) 
| eval savedsearch_name=if(savedsearch_name="","Ad-hoc",savedsearch_name) 
| search savedsearch_name!="_ACCEL*" AND savedsearch_name!="Ad-hoc" 
| timechart span=30m median(total_run_time_mins)

| eval "atf_hour_of_day"=strftime(_time, "%H"), "atf_day_of_week"=strftime(_time, "%w-%A"), "atf_day_of_month"=strftime(_time, "%e"), "atf_month" = strftime(_time, "%m-%B") 
| eventstats dc("atf_hour_of_day"),dc("atf_day_of_week"),dc("atf_day_of_month"),dc("atf_month") | eval "atf_hour_of_day"=if('dc(atf_hour_of_day)'&amp;lt;2, null(), 'atf_hour_of_day'),"atf_day_of_week"=if('dc(atf_day_of_week)'&amp;lt;2, null(), 'atf_day_of_week'),"atf_day_of_month"=if('dc(atf_day_of_month)'&amp;lt;2, null(), 'atf_day_of_month'),"atf_month"=if('dc(atf_month)'&amp;lt;2, null(), 'atf_month') | fields - "dc(atf_hour_of_day)","dc(atf_day_of_week)","dc(atf_day_of_month)","dc(atf_month)" | eval "_atf_hour_of_day_copy"=atf_hour_of_day,"_atf_day_of_week_copy"=atf_day_of_week,"_atf_day_of_month_copy"=atf_day_of_month,"_atf_month_copy"=atf_month | fields - "atf_hour_of_day","atf_day_of_week","atf_day_of_month","atf_month" | rename "_atf_hour_of_day_copy" as "atf_hour_of_day","_atf_day_of_week_copy" as "atf_day_of_week","_atf_day_of_month_copy" as "atf_day_of_month","_atf_month_copy" as "atf_month"

| fit DensityFunction "median(total_run_time_mins)" by "atf_hour_of_day" dist=expon threshold=0.01 show_density=true show_options="feature_variables,split_by,params" into "_exp_draft_ca4283816029483bb0ebe68319e5c3e7"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="anomaly_flakiness_runtime_criteria.png" style="width: 999px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/27300i78FC65E4240FC4E9/image-size/large?v=v2&amp;amp;px=999" role="button" title="anomaly_flakiness_runtime_criteria.png" alt="anomaly_flakiness_runtime_criteria.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;And the code generated by the anomaly detection app -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;``` Same data as above ```

| dedup _time
| sort 0 _time 
| table _time XXXX
| interpolatemissingvalues value_field="XXXX"
| fit AutoAnomalyDetection XXXX job_name=test sensitivity=1
| table _time, XXXX, isOutlier, anomConf&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The major code difference is that with MLTK, we use -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| fit DensityFunction "median(total_run_time_mins)" by "atf_hour_of_day" dist=expon threshold=0.01 show_density=true show_options="feature_variables,split_by,params" into "_exp_draft_ca4283816029483bb0ebe68319e5c3e7"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;whereas with the anomaly detection app, we use -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| fit AutoAnomalyDetection XXXX job_name=test sensitivity=1
| table _time, XXXX, isOutlier, anomConf&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas why the fit function uses DensityFunction vs AutoAnomalyDetection parameters, and why the results are different?&lt;/P&gt;</description>
    <pubDate>Sun, 24 Sep 2023 20:00:04 GMT</pubDate>
    <dc:creator>danielbb</dc:creator>
    <dc:date>2023-09-24T20:00:04Z</dc:date>
    <item>
      <title>Discrepancies between MLTK and Splunk App for Anomaly Detection</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Discrepancies-between-MLTK-and-Splunk-App-for-Anomaly-Detection/m-p/658532#M54252</link>
      <description>&lt;P&gt;With MLTK, when looking at accumulated runtime, the outliers are detected cleanly (three out of three spikes), whereas with the anomaly detection app, only two of the three spikes are detected (along with one false positive, even at medium sensitivity).&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="mltk_median_runtime_comparison.PNG" style="width: 999px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/27302i32F83E17B19D022B/image-size/large?v=v2&amp;amp;px=999" role="button" title="mltk_median_runtime_comparison.PNG" alt="mltk_median_runtime_comparison.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The code generated by the MLTK is as follows -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;index=_audit host=XXXXXXXX action=search info=completed 
| table _time host total_run_time savedsearch_name 
| eval total_run_time_mins=total_run_time/60 
| convert ctime(search_*) 
| eval savedsearch_name=if(savedsearch_name="","Ad-hoc",savedsearch_name) 
| search savedsearch_name!="_ACCEL*" AND savedsearch_name!="Ad-hoc" 
| timechart span=30m median(total_run_time_mins)

| eval "atf_hour_of_day"=strftime(_time, "%H"), "atf_day_of_week"=strftime(_time, "%w-%A"), "atf_day_of_month"=strftime(_time, "%e"), "atf_month" = strftime(_time, "%m-%B") 
| eventstats dc("atf_hour_of_day"),dc("atf_day_of_week"),dc("atf_day_of_month"),dc("atf_month") | eval "atf_hour_of_day"=if('dc(atf_hour_of_day)'&amp;lt;2, null(), 'atf_hour_of_day'),"atf_day_of_week"=if('dc(atf_day_of_week)'&amp;lt;2, null(), 'atf_day_of_week'),"atf_day_of_month"=if('dc(atf_day_of_month)'&amp;lt;2, null(), 'atf_day_of_month'),"atf_month"=if('dc(atf_month)'&amp;lt;2, null(), 'atf_month') | fields - "dc(atf_hour_of_day)","dc(atf_day_of_week)","dc(atf_day_of_month)","dc(atf_month)" | eval "_atf_hour_of_day_copy"=atf_hour_of_day,"_atf_day_of_week_copy"=atf_day_of_week,"_atf_day_of_month_copy"=atf_day_of_month,"_atf_month_copy"=atf_month | fields - "atf_hour_of_day","atf_day_of_week","atf_day_of_month","atf_month" | rename "_atf_hour_of_day_copy" as "atf_hour_of_day","_atf_day_of_week_copy" as "atf_day_of_week","_atf_day_of_month_copy" as "atf_day_of_month","_atf_month_copy" as "atf_month"

| fit DensityFunction "median(total_run_time_mins)" by "atf_hour_of_day" dist=expon threshold=0.01 show_density=true show_options="feature_variables,split_by,params" into "_exp_draft_ca4283816029483bb0ebe68319e5c3e7"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="anomaly_flakiness_runtime_criteria.png" style="width: 999px;"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/27300i78FC65E4240FC4E9/image-size/large?v=v2&amp;amp;px=999" role="button" title="anomaly_flakiness_runtime_criteria.png" alt="anomaly_flakiness_runtime_criteria.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;And the code generated by the anomaly detection app -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;``` Same data as above ```

| dedup _time
| sort 0 _time 
| table _time XXXX
| interpolatemissingvalues value_field="XXXX"
| fit AutoAnomalyDetection XXXX job_name=test sensitivity=1
| table _time, XXXX, isOutlier, anomConf&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The major code difference is that with MLTK, we use -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| fit DensityFunction "median(total_run_time_mins)" by "atf_hour_of_day" dist=expon threshold=0.01 show_density=true show_options="feature_variables,split_by,params" into "_exp_draft_ca4283816029483bb0ebe68319e5c3e7"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;whereas with the anomaly detection app, we use -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| fit AutoAnomalyDetection XXXX job_name=test sensitivity=1
| table _time, XXXX, isOutlier, anomConf&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas why the fit function uses DensityFunction vs AutoAnomalyDetection parameters, and why the results are different?&lt;/P&gt;</description>
      <pubDate>Sun, 24 Sep 2023 20:00:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Discrepancies-between-MLTK-and-Splunk-App-for-Anomaly-Detection/m-p/658532#M54252</guid>
      <dc:creator>danielbb</dc:creator>
      <dc:date>2023-09-24T20:00:04Z</dc:date>
    </item>
    <item>
      <title>Re: Discrepancies between MLTK and Splunk App for Anomaly Detection</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Discrepancies-between-MLTK-and-Splunk-App-for-Anomaly-Detection/m-p/666890#M54545</link>
      <description>&lt;P&gt;DensityFunction and AutoAnomalyDetection are vastly different algorithms, so different results are to be expected. See&amp;nbsp;&lt;A href="https://www.splunk.com/en_us/blog/platform/developing-the-splunk-app-for-anomaly-detection.html?locale=en_us" target="_blank"&gt;Developing the Splunk App for Anomaly Detection | Splunk&lt;/A&gt;&amp;nbsp;for more info on the Anomaly Detection App's custom algorithm and &lt;A href="https://docs.splunk.com/Documentation/MLApp/5.4.1/User/Algorithms#Anomaly_Detection" target="_blank"&gt;Algorithms in the Machine Learning Toolkit - Splunk Documentation&lt;/A&gt;&amp;nbsp;for the MLTK's DensityFunction.&lt;/P&gt;&lt;P&gt;At least in my testing, the ADESCA/Earthgecko-Skyline stack in the Anomaly Detection App is more prone to alerting on non-cyclical low values when compared to the boundaries generated by the DensityFunction, though I have no good explanation for this behavior as of right now.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2023 14:28:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Discrepancies-between-MLTK-and-Splunk-App-for-Anomaly-Detection/m-p/666890#M54545</guid>
      <dc:creator>ljvc</dc:creator>
      <dc:date>2023-10-31T14:28:25Z</dc:date>
    </item>
  </channel>
</rss>

