<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Optimize query that hits disk usage limit when computing stats in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582833#M202972</link>
    <description>&lt;P&gt;Could be even simplified further:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;base_search&amp;gt;
| table _time ResponseStatus | fields - _raw 
| bucket _time span=1d
| fillnull value=504 ResponseStatus
| top 100 ResponseStatus by _time showcount=f
| timechart limit=30 span=1d first(percent) by ResponseStatus&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 27 Jan 2022 22:57:15 GMT</pubDate>
    <dc:creator>johnhuang</dc:creator>
    <dc:date>2022-01-27T22:57:15Z</dc:date>
    <item>
      <title>Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582600#M202913</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;What I'm trying to do is to have a chart with time on x-axis and percentages by &lt;STRONG&gt;ResponseStatus&lt;/STRONG&gt; on y-axis.&amp;nbsp;&lt;/P&gt;&lt;P&gt;To do that I come up with the below Splunk search query:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;match some http requests
| fields _time,ResponseStatus,RequestName
| eval Date=strftime(_time, "%m/%d/%Y")
| eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus)
| eventstats count as "totalCount" by Date
| eventstats count as "codeCount" by Date,ResponseStatus
| eval percent=round((codecount/totalCount)*100)
| chart values(percent) by Date,ResponseStatus&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;But it is hitting the disk usage limit (500MB - which I can't increase) for a 10 days interval. And I'd like to be able to have this on a 3/4 months interval.&lt;BR /&gt;&lt;BR /&gt;What I have noticed is that If I only run the match part of the query, I get all the events without hitting any disk limit, which makes me think the problem is with the counting and group by part of the query.&lt;/P&gt;&lt;P&gt;My guess is that Splunk is making the computation by keeping in-memory (or, trying to do so and eventually swapping to disk) the full event message even if I specified the useful fields via the&amp;nbsp;&lt;STRONG&gt;fields&amp;nbsp;&lt;/STRONG&gt;command.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any way to either effectively have Splunk ignore all the remaining part of the message or obtain the same result via a different path?&lt;BR /&gt;&lt;BR /&gt;Thanks a lot!&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jan 2022 19:08:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582600#M202913</guid>
      <dc:creator>cmontanari</dc:creator>
      <dc:date>2022-01-26T19:08:54Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582648#M202929</link>
      <description>&lt;P&gt;Reduce use of eventstats is always good. &amp;nbsp;Secondly, you can use &lt;A href="https://docs.splunk.com/Documentation/Splunk/8.2.0/SearchReference/Table" target="_blank"&gt;table&lt;/A&gt; to reduce event (row) size; &lt;A href="https://docs.splunk.com/Documentation/Splunk/8.2.0/SearchReference/Fields" target="_blank"&gt;fields&lt;/A&gt; doesn't do quite that.&lt;/P&gt;&lt;P&gt;In your example, you can eliminate one of eventstats like this&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;match some http requests
| table _time,ResponseStatus,RequestName ``` fields does not reduce row size ```
| eval Date=strftime(_time, "%m/%d/%Y")
| eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus)
| stats count as "codeCount" by Date,ResponseStatus
| eventstats sum(count) as "totalCount" by Date
| eval percent=round((codecount/totalCount)*100)
| chart values(percent) by Date,ResponseStatus&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 03:55:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582648#M202929</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2022-01-27T03:55:49Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582682#M202935</link>
      <description>Please remember that when you are replacing fields with table you move processing from indexers to search head! Anyhow stats should remove those additional fields away.&lt;BR /&gt;r. Ismo</description>
      <pubDate>Thu, 27 Jan 2022 07:19:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582682#M202935</guid>
      <dc:creator>isoutamo</dc:creator>
      <dc:date>2022-01-27T07:19:04Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582751#M202954</link>
      <description>&lt;P&gt;Where did you find out &lt;FONT face="courier new,courier"&gt;fields&lt;/FONT&gt; does not reduce row size?&amp;nbsp; This contradicts what we've been told over many years.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 19:11:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582751#M202954</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2022-01-27T19:11:30Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582752#M202955</link>
      <description>&lt;P&gt;You might explicitly remove _raw early in the process so that you operate only on the set of fields you need without dragging more data around.&lt;/P&gt;&lt;PRE&gt;| fields - _raw&lt;/PRE&gt;&lt;P&gt;Furthermore, why do you render the _time to a string? You might use bin/bucket to align the data to full days or whatever time unit you need without creating additional fields.&lt;/P&gt;&lt;P&gt;You might also rethink your eventstats. If you already calculate by date and responsestatus, why calculating by date again? You might use the one you already have and sum it over date.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 13:40:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582752#M202955</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2022-01-27T13:40:32Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582826#M202969</link>
      <description>&lt;P&gt;Thank you! Converting that eventstats into stats made it work. The use of table didn't seem to have an impact on performance overall&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 20:12:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582826#M202969</guid>
      <dc:creator>cmontanari</dc:creator>
      <dc:date>2022-01-27T20:12:58Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582827#M202970</link>
      <description>&lt;P&gt;Thank you! Your message made me realize I could write the query in another way. This is where I landed after the changes suggested by&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/33901"&gt;@yuanliu&lt;/a&gt;&amp;nbsp; + your message:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;match some stuff
| fields _time,ResponseStatus,RequestName
| fields - _raw
| bucket _time span=1d
| eval ResponseStatus=if(isnull(ResponseStatus), 504, ResponseStatus)
| eventstats count as "total" by _time 
| stats count first(total) as "total" by _time, ResponseStatus 
| eval percent=(count/total)*100 
| timechart span=1d first(percent) by ResponseStatus&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Most likely further improvements can be made here as well, but as long as it is not hitting the disk threshold I'm happy with it &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 20:11:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582827#M202970</guid>
      <dc:creator>cmontanari</dc:creator>
      <dc:date>2022-01-27T20:11:54Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582833#M202972</link>
      <description>&lt;P&gt;Could be even simplified further:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;base_search&amp;gt;
| table _time ResponseStatus | fields - _raw 
| bucket _time span=1d
| fillnull value=504 ResponseStatus
| top 100 ResponseStatus by _time showcount=f
| timechart limit=30 span=1d first(percent) by ResponseStatus&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 22:57:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582833#M202972</guid>
      <dc:creator>johnhuang</dc:creator>
      <dc:date>2022-01-27T22:57:15Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582837#M202974</link>
      <description>&lt;P&gt;No. You don't want to do that. Firstly, the removing _raw is not needed. But more importantly, table command is a transforming command and moves further processing to searchhead which kills parallelization.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 22:21:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582837#M202974</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2022-01-27T22:21:39Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582838#M202975</link>
      <description>&lt;P&gt;You know, adding table and removing _raw was actually slightly slower in my testing and took up slightly more disk but the difference was too small for me to be sure.&amp;nbsp; You were the one who first recommended removing _raw and someone else here recommended table.&lt;/P&gt;&lt;PRE&gt;| table _time ResponseStatus | fields - _raw &lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 22:56:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582838#M202975</guid>
      <dc:creator>johnhuang</dc:creator>
      <dc:date>2022-01-27T22:56:40Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize query that hits disk usage limit when computing stats</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582888#M202988</link>
      <description>&lt;P&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/213957"&gt;@richgalloway&lt;/a&gt;&amp;nbsp;You and&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/214410"&gt;@isoutamo&lt;/a&gt;&amp;nbsp;are correct. &amp;nbsp;I didn't realize that events and fields are two separate spaces. &lt;FONT face="courier new,courier"&gt;fields&lt;/FONT&gt; allows events (_raw) to carry on, but search buffers are not burdened by them. &amp;nbsp;As isoutamo points out, &lt;FONT face="courier new,courier"&gt;table&lt;/FONT&gt; may actually carry higher performance penalty.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jan 2022 09:03:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Optimize-query-that-hits-disk-usage-limit-when-computing-stats/m-p/582888#M202988</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2022-01-28T09:03:47Z</dc:date>
    </item>
  </channel>
</rss>

