<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best Practice for Updating Summary Indexed Data in Knowledge Management</title>
    <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98222#M1003</link>
    <description>&lt;P&gt;I think that's functionally similar to having a the summary index queries re-run the same information, but just slightly more manual (or at least outside the control of Splunk). But I definitely use that periodically, when I need to backfill a larger amount of data.&lt;/P&gt;</description>
    <pubDate>Tue, 10 May 2011 17:04:51 GMT</pubDate>
    <dc:creator>David</dc:creator>
    <dc:date>2011-05-10T17:04:51Z</dc:date>
    <item>
      <title>Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98220#M1001</link>
      <description>&lt;P&gt;I'd like to see if there's a "right" way to solve this problem. I've got a lot of delayed entry for data that gets summary indexed on an hourly basis. Most data gets into the system between 30 and 90 minutes late, and some of it gets into the system up to 48 hours late. The volume of data is such that I need hourly and daily summary indexing to make searches reasonable. &lt;/P&gt;

&lt;P&gt;What I've been doing thus far is running my hourly search at the half hour, with the time window of et=-3h@h lt=&lt;A href="mailto:-2h@h"&gt;-2h@h&lt;/A&gt;. Then I run a second search after midnight that generates hourly data for et=-2d@d lt=@h. All of my searches support duplicate entries by running a &lt;CODE&gt;| stats first(myvar) by _time&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;This works acceptably, but is a bit kludgy. Apart from getting the data in real-time (if wishes were horses), is there a better way to approach this? (This question is related to, but different from &lt;A href="http://splunk-base.splunk.com/answers/13379/how-to-better-deal-with-gaps-in-remote-data"&gt;another question of mine&lt;/A&gt;.)&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2011 19:35:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98220#M1001</guid>
      <dc:creator>David</dc:creator>
      <dc:date>2011-05-09T19:35:46Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98221#M1002</link>
      <description>&lt;P&gt;How about using &lt;CODE&gt;fill_summary_index.py&lt;/CODE&gt; to backfill the missing data? Take a look at the following:&lt;BR /&gt;
&lt;A href="http://www.splunk.com/base/Documentation/latest/Knowledge/Managesummaryindexgapsandoverlaps#Use_the_backfill_script_to_add_other_data_or_fill_summary_index_gaps"&gt;Manage summary index gaps and overlaps&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2011 16:09:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98221#M1002</guid>
      <dc:creator>ftk</dc:creator>
      <dc:date>2011-05-10T16:09:44Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98222#M1003</link>
      <description>&lt;P&gt;I think that's functionally similar to having a the summary index queries re-run the same information, but just slightly more manual (or at least outside the control of Splunk). But I definitely use that periodically, when I need to backfill a larger amount of data.&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2011 17:04:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98222#M1003</guid>
      <dc:creator>David</dc:creator>
      <dc:date>2011-05-10T17:04:51Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98223#M1004</link>
      <description>&lt;P&gt;True, true. I figured why not put up a cron job with that script and let it handle the gaps rather than the searches. Not sure if that'll yield any performance benefits tho.&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2011 17:14:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98223#M1004</guid>
      <dc:creator>ftk</dc:creator>
      <dc:date>2011-05-10T17:14:05Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98224#M1005</link>
      <description>&lt;P&gt;Yeah. A lot of why I'm uncomfortable with the method I have is that it will result in at least a triplication of the data, and I'm thinking of increasing hourly search to et=-3h@h lt=now, which would potentially increase the size of the data sixfold. &lt;/P&gt;

&lt;P&gt;This data is pretty tiny (raw: 1.6 MB per day) so it's not really a problem, but we're looking at expanding this out to where we could get a few GB per day, and at that point it would be more problematic. Add unto that, tossing old data into a new bucket can have performance implications.. etc.&lt;/P&gt;

&lt;P&gt;It just seems that there should be a better way =D&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2011 23:39:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98224#M1005</guid>
      <dc:creator>David</dc:creator>
      <dc:date>2011-05-10T23:39:20Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98225#M1006</link>
      <description>&lt;P&gt;What about using the script with -dedup=true?&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2011 23:42:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98225#M1006</guid>
      <dc:creator>ftk</dc:creator>
      <dc:date>2011-05-10T23:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98226#M1007</link>
      <description>&lt;P&gt;Late reply: Possibly, what I really want is a forced overwrite. dedup=true will not run indexing for periods already indexed -- One way to improve my method would be to forcefully run indexing for periods that have already been indexed, and overwrite the data already present. But based on my understanding of bucketing, I don't think that Splunk has any such functionality..&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2011 00:35:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98226#M1007</guid>
      <dc:creator>David</dc:creator>
      <dc:date>2011-05-17T00:35:54Z</dc:date>
    </item>
    <item>
      <title>Re: Best Practice for Updating Summary Indexed Data</title>
      <link>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98227#M1008</link>
      <description>&lt;P&gt;Interesting. Have you considered filing an enhancement request?&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2011 00:39:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Knowledge-Management/Best-Practice-for-Updating-Summary-Indexed-Data/m-p/98227#M1008</guid>
      <dc:creator>ftk</dc:creator>
      <dc:date>2011-05-17T00:39:15Z</dc:date>
    </item>
  </channel>
</rss>

