<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster? in Deployment Architecture</title>
    <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320200#M12050</link>
    <description>&lt;P&gt;I see ; - ) isn't it as simple as giving a buffer of time to allow all events to safely be in the platform? meaning, give a delay of 15-30 minutes....&lt;/P&gt;

&lt;P&gt;-- Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap&lt;/P&gt;

&lt;P&gt;Why? &lt;/P&gt;</description>
    <pubDate>Mon, 23 Oct 2017 14:27:53 GMT</pubDate>
    <dc:creator>ddrillic</dc:creator>
    <dc:date>2017-10-23T14:27:53Z</dc:date>
    <item>
      <title>How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320196#M12046</link>
      <description>&lt;P&gt;We reach situations where summary indexes are incomplete because we have an indexing latency in the cluster.&lt;/P&gt;

&lt;P&gt;We usually set the same number of minutes for the &lt;STRONG&gt;Earliest&lt;/STRONG&gt; and the &lt;STRONG&gt;Run every&lt;/STRONG&gt; parameters... &lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/3704iDADA4F30D73C5020/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;What can be done? I think the issue is that the latency varies throughout the day and the week.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Oct 2017 17:43:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320196#M12046</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2017-10-20T17:43:23Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320197#M12047</link>
      <description>&lt;P&gt;Hi @ddrillic,&lt;/P&gt;

&lt;P&gt;If you are not using latest data from summary index then I'll suggest to change earliest time to &lt;CODE&gt;-60m@m&lt;/CODE&gt; and latest time to &lt;CODE&gt;-30m@m&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 09:31:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320197#M12047</guid>
      <dc:creator>harsmarvania57</dc:creator>
      <dc:date>2017-10-23T09:31:54Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320198#M12048</link>
      <description>&lt;P&gt;@ddrillic, ideally you should pick the previous time window as per your data flow to ensure you are summarizing the events only after you have received all the events for that time window. For example, for the current hour pull data for last hour, or for the current day pull data for yesterday etc. &lt;/P&gt;

&lt;P&gt;Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap. So please ensure you understand the data flow/frequency and the need of summary indexing before kicking of summaries.&lt;/P&gt;

&lt;P&gt;If you want your dahboards to show Details from Real Time index or Summary index you should create a switch for Summary Index based on time selected. You can find an answer for this kind of switch: &lt;A href="https://answers.splunk.com/answers/578984/running-one-of-two-searches-based-on-time-picker-s.html"&gt;https://answers.splunk.com/answers/578984/running-one-of-two-searches-based-on-time-picker-s.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Refer to Splunk Developer Video with detailed explanation of the same: &lt;A href="https://www.splunk.com/view/SP-CAAACZW"&gt;https://www.splunk.com/view/SP-CAAACZW&lt;/A&gt; &lt;BR /&gt;
and also the Splunk Documentation: &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing#Schedule_the_populating_report_to_avoid_data_gaps_and_overlaps"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing#Schedule_the_populating_report_to_avoid_data_gaps_and_overlaps&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 09:58:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320198#M12048</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2017-10-23T09:58:21Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320199#M12049</link>
      <description>&lt;P&gt;Very interesting - now it's clear to me that setting &lt;STRONG&gt;Latest&lt;/STRONG&gt; to now - @m is not practical - much appreciated.  &lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 14:25:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320199#M12049</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2017-10-23T14:25:26Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320200#M12050</link>
      <description>&lt;P&gt;I see ; - ) isn't it as simple as giving a buffer of time to allow all events to safely be in the platform? meaning, give a delay of 15-30 minutes....&lt;/P&gt;

&lt;P&gt;-- Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap&lt;/P&gt;

&lt;P&gt;Why? &lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 14:27:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320200#M12050</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2017-10-23T14:27:53Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320201#M12051</link>
      <description>&lt;P&gt;@ddrillic, sorry I overthought on that one. There would just be gaps no duplicates. If your schedule is on cron to run every 30 min, it might run a bit late (based on priority and load on server). Let us say 12:01 instead of 12:00 and then 12:33 instead of 12:30 and so on through out the day. Duplicates should not come in since scheduled run will not be early in any case. You can test the schedule run based on some mock queries which do not summarize events (if you are using &lt;CODE&gt;collect&lt;/CODE&gt; command you can enable &lt;CODE&gt;testmode=true&lt;/CODE&gt; to ensure it executes scheduled search and generates stats but does not push data to sumary index. Testing with &lt;CODE&gt;collect&lt;/CODE&gt; command will also let you push data to your own summary index which you can get rid of after testing.&lt;/P&gt;

&lt;P&gt;On a safer side, lets say if your cron is set to run every 5th minute of an hour and earliest and latest are set to pull data for &lt;CODE&gt;-1h@h&lt;/CODE&gt; to &lt;CODE&gt;-0h@h&lt;/CODE&gt;, you will not have gaps and in case your data input is impacted, you will have an hour to resolve the issue (I think similar example is there in the video link provided above with window being -2h@h and -1h@h to be even more safe &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt; ). However, if your requirement is to summarize data every 30 mins you can do the same if you are allowing buffer based on delay of data ingestion for that window.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 16:10:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320201#M12051</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2017-10-23T16:10:45Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320202#M12052</link>
      <description>&lt;P&gt;Gorgeous as usual @niketnilay - please convert to an answer.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 16:36:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320202#M12052</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2017-10-23T16:36:36Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320203#M12053</link>
      <description>&lt;P&gt;@ddrillic, thanks for the kind words. For an answer on topics like these I usually wait for &lt;CODE&gt;Gurus&lt;/CODE&gt; to chime in, correct or approve before I convert to answer &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; Hoping that my inputs provided you with what you were looking for. I have converted to answer!&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 17:14:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320203#M12053</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2017-10-23T17:14:49Z</dc:date>
    </item>
    <item>
      <title>Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320204#M12054</link>
      <description>&lt;P&gt;Much appreciated @niketnilay  !!!&lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 17:31:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/How-can-we-avoid-data-loss-in-the-summary-indexes-when-there-is/m-p/320204#M12054</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2017-10-23T17:31:33Z</dc:date>
    </item>
  </channel>
</rss>

