<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Estimating index storage requirements? in Deployment Architecture</title>
    <link>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49426#M1554</link>
    <description>&lt;P&gt;Thanks &lt;EM&gt;d&lt;/EM&gt;, I was probably going to err on the side of caution anyway, but this is the answer I was looking for cheers &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Sep 2012 00:30:47 GMT</pubDate>
    <dc:creator>rturk</dc:creator>
    <dc:date>2012-09-03T00:30:47Z</dc:date>
    <item>
      <title>Estimating index storage requirements?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49424#M1552</link>
      <description>&lt;P&gt;Hi Splunkers,&lt;/P&gt;

&lt;P&gt;I've been doing some design documents for a fairly large distributed deployment of Splunk:&lt;BR /&gt;
 - 100GB/license&lt;BR /&gt;
 - 2 geographically separated sets of 2 VM indexers (w/ direct attached storage)&lt;BR /&gt;
 - 90 days retention required&lt;/P&gt;

&lt;P&gt;I'm now up to the point of estimating the amount of storage I need to give to each of the Indexers (assuming the load is shared evenly among them), however I've come upon a bit of a contradiction in the doco:&lt;/P&gt;

&lt;P&gt;From &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/installation/capacityplanningforalargersplunkdeployment"&gt;Hardware Capacity Planning for your Splunk Deployment&lt;/A&gt;:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;At a high level, total storage is&lt;BR /&gt;
calculated as follows:&lt;BR /&gt;
     daily average rate x retention policy x 1/2&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;So, given my specs above:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;100GB x 90 days X 1/2 = 4.5TB total storage required between 4 indexers = 1.125TB/Indexer
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;BUT, from &lt;A href="http://docs.splunk.com/Documentation/Splunk/4.3.3/Installation/Estimateyourstoragerequirements"&gt;Estimate your storage requirements&lt;/A&gt;:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;Typically, the compressed rawdata file&lt;BR /&gt;
is approximately 10% the size of the&lt;BR /&gt;
incoming, pre-indexed raw data. The&lt;BR /&gt;
associated index files range in size&lt;BR /&gt;
from approximately 10% to 110% of the&lt;BR /&gt;
rawdata file.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;So, given my same specs:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;100GB/day x 90 days = 9TB total raw data to be indexed
(9TB x 10%) + ((9TB x 10%) x 110%) = 1,890GB total storage between 4 indexers = 472.5GB/Indexer
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Have I missed something here? Which recommendation am I meant to run with?&lt;/P&gt;</description>
      <pubDate>Sun, 02 Sep 2012 13:28:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49424#M1552</guid>
      <dc:creator>rturk</dc:creator>
      <dc:date>2012-09-02T13:28:36Z</dc:date>
    </item>
    <item>
      <title>Re: Estimating index storage requirements?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49425#M1553</link>
      <description>&lt;P&gt;Yes, you have &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; and the usage of the word "index" is the reason you're being mislead in this case. &lt;/P&gt;

&lt;P&gt;When raw data is indexed, for each bucket, at a minimum, we store: &lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;an index structure that is associated with it (think of the index at the end of each book)&lt;/LI&gt;
&lt;LI&gt;a compressed file which contains the actual raw data (this is where your events are stored).&lt;BR /&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;So the math goes like this: &lt;/P&gt;

&lt;P&gt;index size = (index structure) + (compressed raw data) =  1/2 (size of uncompressed raw data)&lt;/P&gt;

&lt;P&gt;Given your specs, this is what you should use to calculate: &lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;100GB x 90 days X 1/2 = 4.5TB total storage required between 4 indexers = 1.125TB/Indexer&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Hope this helps, &lt;/P&gt;

&lt;P&gt;d.&lt;/P&gt;</description>
      <pubDate>Sun, 02 Sep 2012 15:21:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49425#M1553</guid>
      <dc:creator>_d_</dc:creator>
      <dc:date>2012-09-02T15:21:35Z</dc:date>
    </item>
    <item>
      <title>Re: Estimating index storage requirements?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49426#M1554</link>
      <description>&lt;P&gt;Thanks &lt;EM&gt;d&lt;/EM&gt;, I was probably going to err on the side of caution anyway, but this is the answer I was looking for cheers &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Sep 2012 00:30:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49426#M1554</guid>
      <dc:creator>rturk</dc:creator>
      <dc:date>2012-09-03T00:30:47Z</dc:date>
    </item>
    <item>
      <title>Re: Estimating index storage requirements?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49427#M1555</link>
      <description>&lt;P&gt;&lt;EM&gt;d&lt;/EM&gt;,&lt;BR /&gt;
I am looking for the math to know the change of the size between before and after indexing. Could you point me how you get the math?&lt;BR /&gt;
I calculate the size according to the document as R.Turk wrote.&lt;BR /&gt;
raw data size: 9TB&lt;BR /&gt;
"rawdata file size": 9TB x 10%&lt;BR /&gt;
Minimum index size: (9TB x 10%) + ((9TB x 10%) x 10%)&lt;BR /&gt;
Maximum index size: (9TB x 10%) + ((9TB x 10%) x 110%)&lt;BR /&gt;
Thank you in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2013 06:04:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49427#M1555</guid>
      <dc:creator>lzhang_soliton</dc:creator>
      <dc:date>2013-03-15T06:04:57Z</dc:date>
    </item>
    <item>
      <title>Re: Estimating index storage requirements?</title>
      <link>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49428#M1556</link>
      <description>&lt;P&gt;I think you all missed the point of what replication_factor was used here and, maybe, if multisite cluster replication were used.&lt;/P&gt;

&lt;P&gt;If the second option where not used and replication factor was 4, each log will reside on each node, so each node will need 4.5Tb of disk.&lt;/P&gt;

&lt;P&gt;Same scenario with a replication factor of 2: If only 1 onde recieve logs, 2 nodes will use 4.5Tb and the other 2 nodes will recieve nothing. The replication pair selection is automatic and you may see it by searching on _audit or at settings-&amp;gt;indexes at each node.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jul 2014 02:43:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Deployment-Architecture/Estimating-index-storage-requirements/m-p/49428#M1556</guid>
      <dc:creator>theunf</dc:creator>
      <dc:date>2014-07-25T02:43:26Z</dc:date>
    </item>
  </channel>
</rss>

