<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Real Time Dashboards &amp; Post Process in Dashboards &amp; Visualizations</title>
    <link>https://community.splunk.com/t5/Dashboards-Visualizations/Real-Time-Dashboards-Post-Process/m-p/108212#M6120</link>
    <description>&lt;P&gt;It depends on the characteristics of the search. &lt;/P&gt;

&lt;P&gt;When you fold up two or more searches into a single base search plus N postprocess searches,  your new base search of course has more dimensions or more statistics or both.   Depending on how many rows the new rolled-up search has,  it might perform much better as 1 search and N postProcesses,   but it &lt;EM&gt;can&lt;/EM&gt; perform better split apart.   &lt;/P&gt;

&lt;P&gt;To illustrate how, lets start with the fact that the postProcess searches all have to be run against the base-search results,  on every update request.  in other words each time all 4 charts update,  you're asking Splunk run 4 postprocess searches against the base results.    &lt;/P&gt;

&lt;P&gt;So if the base search only has a couple hundred lines, they'll all run very very fast, and running them every 2 seconds against realtime search results is no big deal.   And it's &lt;EM&gt;much&lt;/EM&gt; better than getting the same events off disk four times to run four slightly different simple reports.&lt;/P&gt;

&lt;P&gt;However lets say your new base search result has 10,000 rows or 100,000 or a million rows - well, running 4 postProcess searches against those rows every 2 seconds might start to take significant resources.    Of course the resources you're hitting are quite different than if you were running 4 parallel searches,  but it still can eventually come up against limits. &lt;/P&gt;

&lt;P&gt;Some examples: &lt;/P&gt;

&lt;P&gt;Say that you have usernames and clientips, and we're running a search over a couple hours.  We want two charts - one showing the top users, and one showing distinct count of users by clientip or something.   We're getting the same events off disk so it's a clear candidate for using postprocess.&lt;/P&gt;

&lt;P&gt;So our base search would be  &lt;CODE&gt;stats count by username clientip&lt;/CODE&gt;,  and our postprocess searches would be &lt;CODE&gt;stats sum(count) as count by username&lt;/CODE&gt;  and &lt;CODE&gt;stats dc(username) by clientip&lt;/CODE&gt;. &lt;/P&gt;

&lt;P&gt;Now let's say that generally each clientip is only associated with one user or a few users.  And vice versa - each user will generally be on only one clientip.  and there's a few hundred users and a few hundred clientips.   Well this means that the "rolled-up" search will have a number of rows that's only a couple times larger than either the distinct number of users or clientips considered alone.  This will be fine.  The postprocesses will run fast. &lt;/P&gt;

&lt;P&gt;Lets change it though and say that we're correlating usernames with another field that has a very high distinct-count.  Let's say it's just a request id and every single event will have its own request id.   If we do &lt;CODE&gt;stats count by username, request_id&lt;/CODE&gt;,  this search result set be just as long as the underlying set of &lt;EM&gt;events&lt;/EM&gt;,  possibly in the hundreds of thousands or millions.    PostProcess searches running against this result will take a lot longer. So 4 of them running every 2 seconds might start to tax the CPU or RAM, which would come at the expense of something else.   And when the search is being distributed it gets a little more complicated and technical but the same guidelines apply. &lt;/P&gt;

&lt;P&gt;I hope this explanation makes sense - as a further point this is also why it's so important to use the &lt;CODE&gt;bin&lt;/CODE&gt; command to bin the _time values when _time is one of your "by fields" in a base search.  The _time values almost always have a very high distinct count, so &lt;CODE&gt;stats count by username clientip _time&lt;/CODE&gt; will be very large - basically as large as the underlying set of events you're trying to aggregate.&lt;/P&gt;</description>
    <pubDate>Sun, 03 Feb 2013 19:44:45 GMT</pubDate>
    <dc:creator>sideview</dc:creator>
    <dc:date>2013-02-03T19:44:45Z</dc:date>
    <item>
      <title>Real Time Dashboards &amp; Post Process</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Real-Time-Dashboards-Post-Process/m-p/108211#M6119</link>
      <description>&lt;P&gt;I have a fairly complex dashboard that has 6 graphs on it.  Dashboard runs from a search head against multiple indexers.  The page originally started off with 5 real time searches, but I've been able to compress the chart to using 3 real time searches, and then apply post processing to the results for the 6 graphs.&lt;/P&gt;

&lt;P&gt;I have noticed, however, that the dashboard takes much longer to load because of this.  Is it more efficient to have cleaner &amp;amp; more efficient real time searches, or take the approach that I have above?&lt;/P&gt;</description>
      <pubDate>Fri, 01 Feb 2013 22:31:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Real-Time-Dashboards-Post-Process/m-p/108211#M6119</guid>
      <dc:creator>sf_user_199</dc:creator>
      <dc:date>2013-02-01T22:31:56Z</dc:date>
    </item>
    <item>
      <title>Re: Real Time Dashboards &amp; Post Process</title>
      <link>https://community.splunk.com/t5/Dashboards-Visualizations/Real-Time-Dashboards-Post-Process/m-p/108212#M6120</link>
      <description>&lt;P&gt;It depends on the characteristics of the search. &lt;/P&gt;

&lt;P&gt;When you fold up two or more searches into a single base search plus N postprocess searches,  your new base search of course has more dimensions or more statistics or both.   Depending on how many rows the new rolled-up search has,  it might perform much better as 1 search and N postProcesses,   but it &lt;EM&gt;can&lt;/EM&gt; perform better split apart.   &lt;/P&gt;

&lt;P&gt;To illustrate how, lets start with the fact that the postProcess searches all have to be run against the base-search results,  on every update request.  in other words each time all 4 charts update,  you're asking Splunk run 4 postprocess searches against the base results.    &lt;/P&gt;

&lt;P&gt;So if the base search only has a couple hundred lines, they'll all run very very fast, and running them every 2 seconds against realtime search results is no big deal.   And it's &lt;EM&gt;much&lt;/EM&gt; better than getting the same events off disk four times to run four slightly different simple reports.&lt;/P&gt;

&lt;P&gt;However lets say your new base search result has 10,000 rows or 100,000 or a million rows - well, running 4 postProcess searches against those rows every 2 seconds might start to take significant resources.    Of course the resources you're hitting are quite different than if you were running 4 parallel searches,  but it still can eventually come up against limits. &lt;/P&gt;

&lt;P&gt;Some examples: &lt;/P&gt;

&lt;P&gt;Say that you have usernames and clientips, and we're running a search over a couple hours.  We want two charts - one showing the top users, and one showing distinct count of users by clientip or something.   We're getting the same events off disk so it's a clear candidate for using postprocess.&lt;/P&gt;

&lt;P&gt;So our base search would be  &lt;CODE&gt;stats count by username clientip&lt;/CODE&gt;,  and our postprocess searches would be &lt;CODE&gt;stats sum(count) as count by username&lt;/CODE&gt;  and &lt;CODE&gt;stats dc(username) by clientip&lt;/CODE&gt;. &lt;/P&gt;

&lt;P&gt;Now let's say that generally each clientip is only associated with one user or a few users.  And vice versa - each user will generally be on only one clientip.  and there's a few hundred users and a few hundred clientips.   Well this means that the "rolled-up" search will have a number of rows that's only a couple times larger than either the distinct number of users or clientips considered alone.  This will be fine.  The postprocesses will run fast. &lt;/P&gt;

&lt;P&gt;Lets change it though and say that we're correlating usernames with another field that has a very high distinct-count.  Let's say it's just a request id and every single event will have its own request id.   If we do &lt;CODE&gt;stats count by username, request_id&lt;/CODE&gt;,  this search result set be just as long as the underlying set of &lt;EM&gt;events&lt;/EM&gt;,  possibly in the hundreds of thousands or millions.    PostProcess searches running against this result will take a lot longer. So 4 of them running every 2 seconds might start to tax the CPU or RAM, which would come at the expense of something else.   And when the search is being distributed it gets a little more complicated and technical but the same guidelines apply. &lt;/P&gt;

&lt;P&gt;I hope this explanation makes sense - as a further point this is also why it's so important to use the &lt;CODE&gt;bin&lt;/CODE&gt; command to bin the _time values when _time is one of your "by fields" in a base search.  The _time values almost always have a very high distinct count, so &lt;CODE&gt;stats count by username clientip _time&lt;/CODE&gt; will be very large - basically as large as the underlying set of events you're trying to aggregate.&lt;/P&gt;</description>
      <pubDate>Sun, 03 Feb 2013 19:44:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Dashboards-Visualizations/Real-Time-Dashboards-Post-Process/m-p/108212#M6120</guid>
      <dc:creator>sideview</dc:creator>
      <dc:date>2013-02-03T19:44:45Z</dc:date>
    </item>
  </channel>
</rss>

