<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using streamstats with foreach command in Splunk Dev</title>
    <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327057#M4686</link>
    <description>&lt;P&gt;My post highlights what you will probably eventually end up evolving to after you spend some time on this adventure. I started my journey in a similar place to where your head is at with this question. I am working on ways to scale it, and make it easy to apply so Stay Tuned!&lt;/P&gt;

&lt;P&gt;Applying MAD on a ton of hosts should be fine with the streamstats above, but I will be interested to see whether it accomplishes what you are after. &lt;/P&gt;

&lt;P&gt;The search above does not account for cyclical trends but will likely be good for a first pass at high level monitoring for deviations depending on the data. Out of curiosity, what KPI or trend are you applying this to?&lt;/P&gt;</description>
    <pubDate>Mon, 05 Jun 2017 14:04:05 GMT</pubDate>
    <dc:creator>mattymo</dc:creator>
    <dc:date>2017-06-05T14:04:05Z</dc:date>
    <item>
      <title>Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327051#M4680</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am a new Splunk user. I have currently starting fiddling around with the Machine Learning ToolKit(MLTK). I'm trying to write the SPL to perform anomaly detection for thruputs across all my hosts for a specified sourcetype. I have used the following SPL to extract my data from the Splunk internal logs:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=_internal source=*metrics.log group=*sourcetype* series=splunkd
| replace * WITH *hn IN host
| xyseries _time,host,kbps
| foreach *hn [ streamstats window=200 ....]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I will be using the median absolute deviation algorithm from the MLTK, the SPL for this is as shown:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;|streamstats window=200 current=true median("fieldname") as median
| eval absDev=(abs('fieldame'-median))
| streamstats window=200 current=true median(absDev) as medianAbsDev
| eval lowerBound=(median-medianAbsDev*5) , upperBound=(median+medianAbsDev*5)
| eval isOutlier=if('fieldname' &amp;lt; lowerBound OR 'fieldname' &amp;gt; upperBound ,1,0)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;However, whenever I key in the "streamstats" command into the subsearch after calling the "foreach" command, I get the following error:&lt;/P&gt;

&lt;P&gt;Error in 'foreach' command: Search pipeline may not contain non-streaming commands&lt;/P&gt;

&lt;P&gt;Are there any workarounds for this?&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 03:52:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327051#M4680</guid>
      <dc:creator>mngeow</dc:creator>
      <dc:date>2017-06-05T03:52:36Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327052#M4681</link>
      <description>&lt;P&gt;updated &lt;CODE&gt;streamstats&lt;/CODE&gt; code to &lt;CODE&gt;by host&lt;/CODE&gt;, fixed other slight wording issues&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;I'm not sure why you are trying to connect the "dots" that way.  Running individual &lt;CODE&gt;streamstats&lt;/CODE&gt; for each &lt;CODE&gt;host&lt;/CODE&gt; doesn't get you anything that &lt;CODE&gt;streamstats&lt;/CODE&gt; won't give you automatically with &lt;CODE&gt;by host&lt;/CODE&gt;.  &lt;/P&gt;

&lt;P&gt;Just do an initial &lt;CODE&gt;stats&lt;/CODE&gt; command to get the time-chunk by time-chunk data for each host.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; index=_internal source=*metrics.log group=*sourcetype* series=splunkd
 | bin _time span=5m
 | stats avg(kbps) as kbps by _time host
 | streamstats window=200 current=true median(kbps) as medianKbps by host
 | eval absDev=(abs(kpbs-medianKpbs))
 | streamstats window=200 current=true median(absDev) as medianAbsDev by host
 | eval lowerBound=(medianKbps-medianAbsDev*5) , upperBound=(medianKbps+medianAbsDev*5)
 | eval isOutlier=if(kbps&amp;lt; lowerBound OR kbps &amp;gt; upperBound ,1,0)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Note 1 - with this sort of thing, I'd probably use time_window rather than &lt;CODE&gt;window&lt;/CODE&gt;.  Wasn't sure 200 of what, so adjust the code as necessary.&lt;/P&gt;

&lt;P&gt;Note 2 - splunk has standard deviation and percentile aggregate functions, so you'd really be better off with a single pass using p95 and p05&lt;/P&gt;

&lt;P&gt;Note 3 - Please avoid using common reserved word/function names like &lt;CODE&gt;median&lt;/CODE&gt; as variable names in your code, or you will constantly be debugging things unnecessarily.  Especially - ALWAYS rename &lt;CODE&gt;count&lt;/CODE&gt; to something else, or you will regret it when you later try to &lt;CODE&gt;timechart&lt;/CODE&gt; the field that was previously left as &lt;CODE&gt;count&lt;/CODE&gt;, and it won't give you the results you expect.  Just rename &lt;CODE&gt;count&lt;/CODE&gt;.  Always. &lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;So, not knowing your data, here's my first cut at finding outliers, assuming that anything below the 5th percentile or above the 95th is an outlier...&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; earliest=-7d@h index=_internal source=*metrics.log group=*sourcetype* series=splunkd
 | bin _time span=5m
 | stats avg(kbps) as kbps by _time host
 | streamstats time_window=1000m current=true median(kbps) as medianKbps, p5(kbps) as p5Kbps p95(kbps) as p95Kbps by host
 | eval isOutlier=if(kbps&amp;lt; p5Kbps OR kbps &amp;gt; p95Kbps,1,0)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 05 Jun 2017 04:36:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327052#M4681</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-06-05T04:36:19Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327053#M4682</link>
      <description>&lt;P&gt;Rather than try and fix what you already have, I suggest you go back to the very beginning AFTER you digest this answer which will give you more than you need but it has EVERYTHING (prepare to do some work) and it doesn't use the black-box "magic" of ML:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://answers.splunk.com/answers/511894/how-to-use-the-timewrap-command-and-set-an-alert-f.html"&gt;https://answers.splunk.com/answers/511894/how-to-use-the-timewrap-command-and-set-an-alert-f.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 04:56:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327053#M4682</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-06-05T04:56:13Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327054#M4683</link>
      <description>&lt;P&gt;I've already seen the post that you have linked. The post is informative as it describes a better way of anomaly detection, but essentially the median absolute deviation algorithm is still made into a macro and called for each individual field, I am trying to automate this process.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 06:14:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327054#M4683</guid>
      <dc:creator>mngeow</dc:creator>
      <dc:date>2017-06-05T06:14:47Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327055#M4684</link>
      <description>&lt;P&gt;Fantastic answer! I never thought of visualizing my data is this fashion. In my initial code I have set the host names to be individual fields will thruput values. Hence I was trying to evaluate the isOutlier fields for each of the hosts.&lt;/P&gt;

&lt;P&gt;The window refers to the number of samples that I will use to find the median. Eg. window=200 means I'll compute the median of every 200 samples.&lt;/P&gt;

&lt;P&gt;I also have a small question. When you list the data as it is by using &lt;BR /&gt;
&lt;CODE&gt;&lt;BR /&gt;
| stats avg(kbps) as kbps by _time, host &lt;BR /&gt;
| streamstats window=200 current=true median(kbps) as medianKbps&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Aren't you taking the median of Kbps across all hosts? I would like to find the median Kbps of each host.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 06:20:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327055#M4684</guid>
      <dc:creator>mngeow</dc:creator>
      <dc:date>2017-06-05T06:20:47Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327056#M4685</link>
      <description>&lt;P&gt;Yep, needed "by host" at the end of the two &lt;CODE&gt;streamstats&lt;/CODE&gt; commands.  &lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 13:07:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327056#M4685</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-06-05T13:07:25Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327057#M4686</link>
      <description>&lt;P&gt;My post highlights what you will probably eventually end up evolving to after you spend some time on this adventure. I started my journey in a similar place to where your head is at with this question. I am working on ways to scale it, and make it easy to apply so Stay Tuned!&lt;/P&gt;

&lt;P&gt;Applying MAD on a ton of hosts should be fine with the streamstats above, but I will be interested to see whether it accomplishes what you are after. &lt;/P&gt;

&lt;P&gt;The search above does not account for cyclical trends but will likely be good for a first pass at high level monitoring for deviations depending on the data. Out of curiosity, what KPI or trend are you applying this to?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 14:04:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327057#M4686</guid>
      <dc:creator>mattymo</dc:creator>
      <dc:date>2017-06-05T14:04:05Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327058#M4687</link>
      <description>&lt;P&gt;Make sure to let me know where the output of your "stay tuned" gets dumped, @mmodestino.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jun 2017 14:57:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327058#M4687</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-06-05T14:57:59Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327059#M4688</link>
      <description>&lt;P&gt;Yeah I just figured it out. Thanks alot!&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2017 02:23:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327059#M4688</guid>
      <dc:creator>mngeow</dc:creator>
      <dc:date>2017-06-06T02:23:31Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327060#M4689</link>
      <description>&lt;P&gt;@mmodestino Currently I am just detecting anomalies in my thruput. But for analyzing cyclic data such as internet traffic, which peaks during certain times of the day, wouldn't it suffice to just tweak the window size?&lt;BR /&gt;
I can see what you're trying to do with your code, but it would be hard to automate that process across multiple data types.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2017 02:25:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327060#M4689</guid>
      <dc:creator>mngeow</dc:creator>
      <dc:date>2017-06-06T02:25:18Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327061#M4690</link>
      <description>&lt;P&gt;Heh.  That's why they pay us the medium bucks... 'cause we can look at this stuff and figure out what to tweak.   &lt;/P&gt;

&lt;P&gt;In general, I'd probably say "no", window size won't do it.  On the other hand, you could probably set up some kind of lump categories and do your analysis based on those as well as host. &lt;/P&gt;

&lt;P&gt;For example,  run 5, 10, 15, 20, 30, 60 minute increments across a few months to create &lt;CODE&gt;avg&lt;/CODE&gt; and &lt;CODE&gt;stdev&lt;/CODE&gt; baselines by chunk, then use  &lt;CODE&gt;cluster&lt;/CODE&gt; or &lt;CODE&gt;kmeans&lt;/CODE&gt; to group them by similarity, then assign consecutive time chunks that are in the same cluster to a "slice", and do your anomaly detection by host and slice instead of just by host.&lt;/P&gt;

&lt;P&gt;All of which constitutes "what to tweak".&lt;/P&gt;

&lt;HR /&gt;

&lt;P&gt;I'd probably just start by defining daytype= {"weekday", "weekend", or "holiday"} and timeExpectedActivity = {"high", "med", "low", "varies"}, where a time chunk gets "high" if median(timechunk) - k * stdev(timechunk) &amp;gt; median(overall) for some selected k in {1.5-2.5} or so, or gets "varies" if stdev(timechunk) is extreme relative to stdev(overall).  &lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2017 15:20:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327061#M4690</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-06-06T15:20:55Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327062#M4691</link>
      <description>&lt;P&gt;no, window size won't do it, at least not for detecting something that isnt a huge spike or dip.  It is the comparison to the previous values that will take you to the next level. &lt;/P&gt;

&lt;P&gt;Expecting to "automate" advanced outlier detection is wishful thinking, tbh...There will always be config to maintain, and tweak and iterate on... whether using ITSI (as black box as you are gunna get) or something like my code. &lt;/P&gt;

&lt;P&gt;I am trying to mess with the macros to see if I can make it one giant streamstats but I don't think i can because of the need for the timewrap and series. &lt;/P&gt;

&lt;P&gt;I'm with @DalJeanis, in my adventure with this (which was similar to you, i was looking for ways to do this on THOUSANDS of data trends) i was leaning toward storing my outlier detection searches in a lookup or kv store, then using ML to cluster like interfaces by interface speed, avg traffic, or other identifiers. Alas, i am not a data scientist and i have not made it that far yet. &lt;/P&gt;

&lt;P&gt;short answer is...there is no such thing as an easy button for advanced outlier detection...at least not in any that is going to come through for you in the clutch&lt;/P&gt;

&lt;P&gt;This is what you want to catch...and simple Median Abs Dev will not catch it. Catching when things degrade or change slightly is the money alert...but you are definitely on the right track and definitely should start with what you have to get a sense of what works (or doesnt) in your environment. &lt;/P&gt;

&lt;P&gt;&lt;IMG src="http://i.imgur.com/GxtubL9.png" alt="alt text" /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2017 16:36:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327062#M4691</guid>
      <dc:creator>mattymo</dc:creator>
      <dc:date>2017-06-06T16:36:31Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327063#M4692</link>
      <description>&lt;P&gt;Is that a fez hiding in that tidal wave?&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 02:59:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327063#M4692</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-08-30T02:59:28Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327064#M4693</link>
      <description>&lt;P&gt;LOL! &lt;/P&gt;

&lt;P&gt;I will have an update on this for you soon @mngeow. &lt;/P&gt;

&lt;P&gt;I think I have a method that will allow you to apply the same type of anomaly detection I use with timewrap and macro, with just straight streamstats. &lt;/P&gt;

&lt;P&gt;I don't think automated is the right word...but should give you some reusable SPL or methods that will help you apply this kind of stuff in bulk. &lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 03:07:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327064#M4693</guid>
      <dc:creator>mattymo</dc:creator>
      <dc:date>2017-08-30T03:07:01Z</dc:date>
    </item>
    <item>
      <title>Re: Using streamstats with foreach command</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327065#M4694</link>
      <description>&lt;P&gt;Make sure that you hide some Easter-Egg fezzes in it!  And quit slacking: make sure you and the tassel and the &lt;CODE&gt;&amp;gt;&lt;/CODE&gt;, too!&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2017 22:44:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/Using-streamstats-with-foreach-command/m-p/327065#M4694</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-08-30T22:44:58Z</dc:date>
    </item>
  </channel>
</rss>

