<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to calculate concurrency distribution in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153701#M185637</link>
    <description>&lt;P&gt;OK.  Everything before the last line in your question &lt;CODE&gt;| stats count(concurrency) by concurrency&lt;/CODE&gt; is from our previous question over at &lt;A href="http://answers.splunk.com/answers/227393/how-to-use-the-concurrency-command-to-timechart-th.html"&gt;http://answers.splunk.com/answers/227393/how-to-use-the-concurrency-command-to-timechart-th.html&lt;/A&gt;   so this is really more concerned with looking at a "frequency distribution of concurrency", not the nuts and bolts of how that concurrency is calculated. &lt;/P&gt;

&lt;P&gt;Using &lt;CODE&gt;| stats count by concurrency&lt;/CODE&gt; like you are here will produce results that look like what you want, but it'll be an extremely misleading visualization.   The reason is that the rows coming into the stats command are all either the start time or the end time of a call.   There is no representation for the time in between these points.  &lt;/P&gt;

&lt;P&gt;To see why this is a problem, let's look at a specific situations.  Let's analyze a time period from 1pm to 2pm.  Say we have 5 calls that all start right at 1pm and that are each 15 minutes long.  Thinking about this, we want to ultimately see some frequency distribution with a lot of concurrency=0, and a fair bit of concurrency=5.   &lt;/P&gt;

&lt;P&gt;However let's pipe such a set into &lt;CODE&gt;| stats count by concurrency&lt;/CODE&gt;.  It will give us a distribution but it turns out that concurrency=0 will have a value of 1, then every value of concurrency from 1 to 4 will have a value of 2, then concurrency=5 will have a value of 1.   This doesn't seem to match our expectations very well.  &lt;/P&gt;

&lt;P&gt;Then if we consider 5 other calls that start at 2pm but that are each only 1 minute long. The &lt;CODE&gt;stats count by concurrency&lt;/CODE&gt; output here will be exactly the same as it was for the first calls.   If we consider a time period that covers both sets of calls, it'll be the same chart with all y-axis values doubled.  o_O&lt;/P&gt;

&lt;P&gt;Instead I think it makes more sense to analyze some discrete unit of time like minutes or seconds, and look at the distribution of concurrency values for each sourceip &lt;EM&gt;across all the time periods&lt;/EM&gt;.  &lt;/P&gt;

&lt;P&gt;Here in the search below I'm telling  timechart to use not 1 second but rather 15 minutes as our bucket size but it's up to you.  The search language looks a lot like what we did in the last question except at the end we have an extra &lt;CODE&gt;untable&lt;/CODE&gt; and  &lt;CODE&gt;stats&lt;/CODE&gt; command. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... all the stuff before up to 
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart span=15min max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by sourceip limit=20 
| filldown last_concurrency* 
| foreach "max_concurrency: *" [eval &amp;lt;&amp;lt;MATCHSTR&amp;gt;&amp;gt;=coalesce('max_concurrency: &amp;lt;&amp;lt;MATCHSTR&amp;gt;&amp;gt;','last_concurrency: &amp;lt;&amp;lt;MATCHSTR&amp;gt;&amp;gt;')] 
| fields - last_concurrency* max_concurrency*
| untable _time sourceip concurrency
| chart count over concurrency by sourceip
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That search above will analyze all the 15 minute periods for each sourceip, and give you a frequency distribution for each sourceip value, and graph them all as separate lines on the same frequency distribution chart.  &lt;/P&gt;

&lt;P&gt;If instead you want to just see a single overarching frequency distribution of "per-sourceip concurrency", replace that last &lt;CODE&gt;chart&lt;/CODE&gt; command with our old friend,   {drum fill}    &lt;CODE&gt;| stats count by concurrency&lt;/CODE&gt;.    &lt;/P&gt;</description>
    <pubDate>Wed, 22 Apr 2015 04:19:42 GMT</pubDate>
    <dc:creator>sideview</dc:creator>
    <dc:date>2015-04-22T04:19:42Z</dc:date>
    <item>
      <title>How to calculate concurrency distribution</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153699#M185635</link>
      <description>&lt;P&gt;I have the following event that needs to calculate concurrency distribution:&lt;/P&gt;

&lt;P&gt;Event, starttime=yyyy-mm-dd hh:mm:ss, duration=, sourceip=a.b.c.d&lt;/P&gt;

&lt;P&gt;| rex "duration=(?.*?),"&lt;BR /&gt;&lt;BR /&gt;
| eval StartTime=round(strptime(startTime,"%Y-%m-%dT%H:%M:%SZ"),0)&lt;BR /&gt;
|eval _time=StartTime&lt;BR /&gt;
| eval increment = mvappend("1","-1")&lt;BR /&gt;
| mvexpand increment&lt;BR /&gt;
| eval _time = if(increment==1, _time, _time + Duration)&lt;BR /&gt;
| sort 0 + _time&lt;BR /&gt;
| fillnull sourceip value="NULL"&lt;BR /&gt;
| streamstats sum(increment) as post_concurrency by sourceip&lt;BR /&gt;
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)&lt;BR /&gt;
| stats count(concurrency) by concurrency &lt;/P&gt;

&lt;P&gt;I want to take a look at the concurrency distribution to find out if it is matching a z-distribution. &lt;/P&gt;

&lt;P&gt;It seems giving me the data I want, but would like to get your opinion. I am not sure of the granularity of concurrency here, was it count it by second? Both StartTime and Duration are down to second.&lt;/P&gt;

&lt;P&gt;Is there a way to make concurrency as x-Axis, and count(concurrency) as y-Axis?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 19:36:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153699#M185635</guid>
      <dc:creator>jgcsco</dc:creator>
      <dc:date>2020-09-28T19:36:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate concurrency distribution</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153700#M185636</link>
      <description>&lt;P&gt;I have posted another related question at &lt;/P&gt;

&lt;P&gt;&lt;A href="http://answers.splunk.com/answers/227393/how-to-use-the-concurrency-command-to-timechart-th.html"&gt;http://answers.splunk.com/answers/227393/how-to-use-the-concurrency-command-to-timechart-th.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2015 23:43:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153700#M185636</guid>
      <dc:creator>jgcsco</dc:creator>
      <dc:date>2015-04-21T23:43:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate concurrency distribution</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153701#M185637</link>
      <description>&lt;P&gt;OK.  Everything before the last line in your question &lt;CODE&gt;| stats count(concurrency) by concurrency&lt;/CODE&gt; is from our previous question over at &lt;A href="http://answers.splunk.com/answers/227393/how-to-use-the-concurrency-command-to-timechart-th.html"&gt;http://answers.splunk.com/answers/227393/how-to-use-the-concurrency-command-to-timechart-th.html&lt;/A&gt;   so this is really more concerned with looking at a "frequency distribution of concurrency", not the nuts and bolts of how that concurrency is calculated. &lt;/P&gt;

&lt;P&gt;Using &lt;CODE&gt;| stats count by concurrency&lt;/CODE&gt; like you are here will produce results that look like what you want, but it'll be an extremely misleading visualization.   The reason is that the rows coming into the stats command are all either the start time or the end time of a call.   There is no representation for the time in between these points.  &lt;/P&gt;

&lt;P&gt;To see why this is a problem, let's look at a specific situations.  Let's analyze a time period from 1pm to 2pm.  Say we have 5 calls that all start right at 1pm and that are each 15 minutes long.  Thinking about this, we want to ultimately see some frequency distribution with a lot of concurrency=0, and a fair bit of concurrency=5.   &lt;/P&gt;

&lt;P&gt;However let's pipe such a set into &lt;CODE&gt;| stats count by concurrency&lt;/CODE&gt;.  It will give us a distribution but it turns out that concurrency=0 will have a value of 1, then every value of concurrency from 1 to 4 will have a value of 2, then concurrency=5 will have a value of 1.   This doesn't seem to match our expectations very well.  &lt;/P&gt;

&lt;P&gt;Then if we consider 5 other calls that start at 2pm but that are each only 1 minute long. The &lt;CODE&gt;stats count by concurrency&lt;/CODE&gt; output here will be exactly the same as it was for the first calls.   If we consider a time period that covers both sets of calls, it'll be the same chart with all y-axis values doubled.  o_O&lt;/P&gt;

&lt;P&gt;Instead I think it makes more sense to analyze some discrete unit of time like minutes or seconds, and look at the distribution of concurrency values for each sourceip &lt;EM&gt;across all the time periods&lt;/EM&gt;.  &lt;/P&gt;

&lt;P&gt;Here in the search below I'm telling  timechart to use not 1 second but rather 15 minutes as our bucket size but it's up to you.  The search language looks a lot like what we did in the last question except at the end we have an extra &lt;CODE&gt;untable&lt;/CODE&gt; and  &lt;CODE&gt;stats&lt;/CODE&gt; command. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... all the stuff before up to 
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart span=15min max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by sourceip limit=20 
| filldown last_concurrency* 
| foreach "max_concurrency: *" [eval &amp;lt;&amp;lt;MATCHSTR&amp;gt;&amp;gt;=coalesce('max_concurrency: &amp;lt;&amp;lt;MATCHSTR&amp;gt;&amp;gt;','last_concurrency: &amp;lt;&amp;lt;MATCHSTR&amp;gt;&amp;gt;')] 
| fields - last_concurrency* max_concurrency*
| untable _time sourceip concurrency
| chart count over concurrency by sourceip
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;That search above will analyze all the 15 minute periods for each sourceip, and give you a frequency distribution for each sourceip value, and graph them all as separate lines on the same frequency distribution chart.  &lt;/P&gt;

&lt;P&gt;If instead you want to just see a single overarching frequency distribution of "per-sourceip concurrency", replace that last &lt;CODE&gt;chart&lt;/CODE&gt; command with our old friend,   {drum fill}    &lt;CODE&gt;| stats count by concurrency&lt;/CODE&gt;.    &lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2015 04:19:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153701#M185637</guid>
      <dc:creator>sideview</dc:creator>
      <dc:date>2015-04-22T04:19:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate concurrency distribution</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153702#M185638</link>
      <description>&lt;P&gt;Thanks sideview for the detailed information. Will give it a try when I have a chance. &lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2015 15:29:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-calculate-concurrency-distribution/m-p/153702#M185638</guid>
      <dc:creator>jgcsco</dc:creator>
      <dc:date>2015-05-04T15:29:10Z</dc:date>
    </item>
  </channel>
</rss>

