<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic sum(x/y) and sum(x)/sum(y) in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75182#M18991</link>
    <description>&lt;P&gt;This has stumped me for too long so I'm opening it up to the experts.&lt;/P&gt;

&lt;P&gt;I have some event data of format "timestamp, Category, SubCategory, X, Y". The data has already been processed by the source system so that there are, say, 5000 events per timestamp. I want to filter on the Category, leaving about 500 relevant events to process.&lt;/P&gt;

&lt;P&gt;X is a data volume (e.g. through an interface). I can generate the total using sum(X). &lt;/P&gt;

&lt;P&gt;Y gives me the number of users. There is a fixed value of Y per SubCategory, so that for a given Category values(Y) would contain 4 or 5 data points. This is not really a problem as I can return the correct value in a subsearch through "join Category,SubCategory [search | stats | mvexpand | stats ]"&lt;/P&gt;

&lt;P&gt;I want to calculate the average per-user volume for X for a given category and also for each subCategory within the category. &lt;/P&gt;

&lt;P&gt;Using stats gives me:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&lt;BR /&gt;
SubCategory   UsersInSubCategory   sum(X)   sum(X/Y)&lt;BR /&gt;
   A             100               100MB    1MB&lt;BR /&gt;
   B             200               200MB    1MB&lt;BR /&gt;
 Totals          300               300MB    2MB&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This is correct when breaking out by SubCategory, but for the whole Category I cannot use sum(X/Y) as what I want is sum(X)/sum(Y). Since the underlying events are in the thousands avg(X/Y) gives me  averages per event, not per SubCategory. &lt;/P&gt;

&lt;P&gt;I'm beginning to think two queries might be easier.&lt;/P&gt;</description>
    <pubDate>Wed, 14 Sep 2011 12:06:38 GMT</pubDate>
    <dc:creator>inglisn</dc:creator>
    <dc:date>2011-09-14T12:06:38Z</dc:date>
    <item>
      <title>sum(x/y) and sum(x)/sum(y)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75182#M18991</link>
      <description>&lt;P&gt;This has stumped me for too long so I'm opening it up to the experts.&lt;/P&gt;

&lt;P&gt;I have some event data of format "timestamp, Category, SubCategory, X, Y". The data has already been processed by the source system so that there are, say, 5000 events per timestamp. I want to filter on the Category, leaving about 500 relevant events to process.&lt;/P&gt;

&lt;P&gt;X is a data volume (e.g. through an interface). I can generate the total using sum(X). &lt;/P&gt;

&lt;P&gt;Y gives me the number of users. There is a fixed value of Y per SubCategory, so that for a given Category values(Y) would contain 4 or 5 data points. This is not really a problem as I can return the correct value in a subsearch through "join Category,SubCategory [search | stats | mvexpand | stats ]"&lt;/P&gt;

&lt;P&gt;I want to calculate the average per-user volume for X for a given category and also for each subCategory within the category. &lt;/P&gt;

&lt;P&gt;Using stats gives me:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&lt;BR /&gt;
SubCategory   UsersInSubCategory   sum(X)   sum(X/Y)&lt;BR /&gt;
   A             100               100MB    1MB&lt;BR /&gt;
   B             200               200MB    1MB&lt;BR /&gt;
 Totals          300               300MB    2MB&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This is correct when breaking out by SubCategory, but for the whole Category I cannot use sum(X/Y) as what I want is sum(X)/sum(Y). Since the underlying events are in the thousands avg(X/Y) gives me  averages per event, not per SubCategory. &lt;/P&gt;

&lt;P&gt;I'm beginning to think two queries might be easier.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Sep 2011 12:06:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75182#M18991</guid>
      <dc:creator>inglisn</dc:creator>
      <dc:date>2011-09-14T12:06:38Z</dc:date>
    </item>
    <item>
      <title>Re: sum(x/y) and sum(x)/sum(y)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75183#M18992</link>
      <description>&lt;P&gt;hi! have you tried using the eval command to calculate sum(X)/sum(Y)?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | stats sum(X) AS x sum(Y) AS y | eval r=x/y
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;you can read more about eval in the search reference: &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval"&gt;http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 14 Sep 2011 17:20:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75183#M18992</guid>
      <dc:creator>sophy</dc:creator>
      <dc:date>2011-09-14T17:20:52Z</dc:date>
    </item>
    <item>
      <title>Re: sum(x/y) and sum(x)/sum(y)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75184#M18993</link>
      <description>&lt;P&gt;I did try that. I was expecting that the first stats would generate a table which can be piped onwards through more functions so I could generate averages over that summary table (4 or 5 rows) however I cant get the value of 'r' to appear anywhere. Piping the above into another stats results in no results.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2011 09:35:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75184#M18993</guid>
      <dc:creator>inglisn</dc:creator>
      <dc:date>2011-09-15T09:35:43Z</dc:date>
    </item>
    <item>
      <title>Re: sum(x/y) and sum(x)/sum(y)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75185#M18994</link>
      <description>&lt;P&gt;Instead of piping from one stats command into a second stats command, why not try using eventstats to calculate the sums?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | eventstats sum(X) as sumX, sum(Y) as sumY | eval r=sumX/sumY | stats values(r)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 15 Sep 2011 16:23:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75185#M18994</guid>
      <dc:creator>acdevlin</dc:creator>
      <dc:date>2011-09-15T16:23:14Z</dc:date>
    </item>
    <item>
      <title>Re: sum(x/y) and sum(x)/sum(y)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75186#M18995</link>
      <description>&lt;P&gt;OK, so I played around an piping into eval started working. Go figure. Here's my current solution:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&lt;BR /&gt;
search and filter on Category&lt;BR /&gt;
  | stats first(_time) as timestr, values(Y) as y, sum(X) as x by date_year, date_month, date_mday, date_hour, SubCategory&lt;BR /&gt;
  | convert timeformat="%Y-%m-%d %H:00" ctime(timestr) as DateTime &lt;BR /&gt;
  | stats sum(y) as y, sum(x) as x by DateTime, SubCategory&lt;BR /&gt;
  | eval r=x/y &lt;BR /&gt;
  | chart sum(r) by DateTime, SubCategory&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;The raw data is grouped into 300s clumps comprising several thousand unique events. The first stats command creates the period over which the summaries are built and includes a timestamp which is used by the second stats to build the actual table. Then we eval the average and pipe into chart to produce something that can be put into a graph. &lt;/P&gt;

&lt;P&gt;This allows me to keep or drop the SubCategory grouping in the second stats and the chart clauses and calculate the correct average over the Category. &lt;/P&gt;

&lt;P&gt;By using stats I've lost the event timestamp so needed to manually re-insert it. I works, but not as nicely as timechart as the X-Axis strings are quite long and a bit ugly when plotting over a week (168 points per series). Changing it to summarise over different time periods would involve changing the 2nd and 3rd lines. &lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:53:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/sum-x-y-and-sum-x-sum-y/m-p/75186#M18995</guid>
      <dc:creator>inglisn</dc:creator>
      <dc:date>2020-09-28T09:53:17Z</dc:date>
    </item>
  </channel>
</rss>

