<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Summary indexing and multiple transforming commands in Knowledge Management</title>
    <link>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291039#M2556</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I have a search that runs over eventdata from a website that runs over a few weeks of data. It should return (among other things) the average number of pageviews per user and the standard deviation of this number. I want to create a summary index of this data. I am using two transforming commands to achieve this. The final one of these I replace by its si- variant to create the summary index. However, the results are not the same as when I just run the original search (over the same time range) against the raw data. &lt;/P&gt;

&lt;P&gt;The search looks like this:&lt;/P&gt;

&lt;P&gt;event="pageview"&lt;BR /&gt;
| rename ...&lt;BR /&gt;
| eval Variant = ... , deviceGroup = ...&lt;BR /&gt;
| stats count as pageview_per_user by userID, Variant, deviceGroup&lt;BR /&gt;
| sistats dc(userID) as users, sum(pageview_per_user) as pageviews, avg(pageview_per_user) as avg_pv, stdev(pageview_per_user) as std_pv by Variant, deviceGroup&lt;BR /&gt;
In particular the total number of pageviews is way off when I use the summary index. I am running the search that populates the summary index over a time range: 1 august until &lt;A href="mailto:-1h@h" target="_blank"&gt;-1h@h&lt;/A&gt;. It runs every hour.&lt;/P&gt;

&lt;P&gt;What could be the problem here, could it be the use the two (si)stats commands?&lt;/P&gt;

&lt;P&gt;By the way, I realize I could probably use report acceleration here, but I want to understand summary indexing better.&lt;/P&gt;

&lt;P&gt;Best, Jacob&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 15:26:36 GMT</pubDate>
    <dc:creator>JacobPN</dc:creator>
    <dc:date>2020-09-29T15:26:36Z</dc:date>
    <item>
      <title>Summary indexing and multiple transforming commands</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291039#M2556</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I have a search that runs over eventdata from a website that runs over a few weeks of data. It should return (among other things) the average number of pageviews per user and the standard deviation of this number. I want to create a summary index of this data. I am using two transforming commands to achieve this. The final one of these I replace by its si- variant to create the summary index. However, the results are not the same as when I just run the original search (over the same time range) against the raw data. &lt;/P&gt;

&lt;P&gt;The search looks like this:&lt;/P&gt;

&lt;P&gt;event="pageview"&lt;BR /&gt;
| rename ...&lt;BR /&gt;
| eval Variant = ... , deviceGroup = ...&lt;BR /&gt;
| stats count as pageview_per_user by userID, Variant, deviceGroup&lt;BR /&gt;
| sistats dc(userID) as users, sum(pageview_per_user) as pageviews, avg(pageview_per_user) as avg_pv, stdev(pageview_per_user) as std_pv by Variant, deviceGroup&lt;BR /&gt;
In particular the total number of pageviews is way off when I use the summary index. I am running the search that populates the summary index over a time range: 1 august until &lt;A href="mailto:-1h@h" target="_blank"&gt;-1h@h&lt;/A&gt;. It runs every hour.&lt;/P&gt;

&lt;P&gt;What could be the problem here, could it be the use the two (si)stats commands?&lt;/P&gt;

&lt;P&gt;By the way, I realize I could probably use report acceleration here, but I want to understand summary indexing better.&lt;/P&gt;

&lt;P&gt;Best, Jacob&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 15:26:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291039#M2556</guid>
      <dc:creator>JacobPN</dc:creator>
      <dc:date>2020-09-29T15:26:36Z</dc:date>
    </item>
    <item>
      <title>Re: Summary indexing and multiple transforming commands</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291040#M2557</link>
      <description>&lt;P&gt;looks like you are doing stats over stats.&lt;BR /&gt;
try and arrange the data according to your needs and use the &lt;CODE&gt;| collect&lt;/CODE&gt; command to send the results to summary index.&lt;BR /&gt;
read here more:&lt;BR /&gt;
&lt;A href="http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Collect"&gt;http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Collect&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 12:13:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291040#M2557</guid>
      <dc:creator>adonio</dc:creator>
      <dc:date>2017-08-16T12:13:12Z</dc:date>
    </item>
    <item>
      <title>Re: Summary indexing and multiple transforming commands</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291041#M2558</link>
      <description>&lt;P&gt;Thanks! Could you explain a little about why the stats over stats search I wrote gives the wrong results?&lt;/P&gt;

&lt;P&gt;And should I just replace sistats by stats, pipe everything to collect, and schedule the search to run every hour (and should I still check "Enable" under summary indexing when I save the serach?). &lt;BR /&gt;
Sorry for my ignorance.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 13:56:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291041#M2558</guid>
      <dc:creator>JacobPN</dc:creator>
      <dc:date>2017-08-16T13:56:32Z</dc:date>
    </item>
    <item>
      <title>Re: Summary indexing and multiple transforming commands</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291042#M2559</link>
      <description>&lt;P&gt;There is nothing wrong with two stats commands, as long as they are aggregating the information you want them to collect.  Since the aggregation is running hourly, across a longer time frame, you need to add &lt;CODE&gt;_time&lt;/CODE&gt; into the first &lt;CODE&gt;stats&lt;/CODE&gt; in order to collect valid comparison data.&lt;/P&gt;

&lt;P&gt;Do this across your longer time frame and see how well it matches a pull from your summary for the same time frame.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;  event="pageview"
 | rename ...
 | eval Variant = ... , deviceGroup = ...
 | bin _time span=1h
 | stats count as pageview_per_user by userID, Variant, deviceGroup, _time
 | stats count as users, sum(pageview_per_user) as pageviews, avg(pageview_per_user) as avg_pv, stdev(pageview_per_user) as std_pv by Variant, deviceGroup
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;or &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; | sistats dc(userID) as users, sum(pageview_per_user) as pageviews, avg(pageview_per_user) as avg_pv, stdev(pageview_per_user) as std_pv by Variant, deviceGroup
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 16 Aug 2017 15:50:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Summary-indexing-and-multiple-transforming-commands/m-p/291042#M2559</guid>
      <dc:creator>DalJeanis</dc:creator>
      <dc:date>2017-08-16T15:50:06Z</dc:date>
    </item>
  </channel>
</rss>

