<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How does compression work and what should I expect to see in volume of data as it is stored in an index in SPLUNK 6.1.2 and 6.2? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-does-compression-work-and-what-should-I-expect-to-see-in/m-p/129142#M26507</link>
    <description>&lt;P&gt;I am trying to understand what I should expect to see regarding the volume of data I ingest into SPLUNK and its volume as it is stored in a SPLUNK index. Some of the articles I have been reading would suggest that I should see up to a 50% compression in size.&lt;/P&gt;

&lt;P&gt;I have ingested into a SPLUNK 6.1.2 and 6.2 instance the following data:&lt;/P&gt;

&lt;P&gt;959 files which in total contains 990978 rows of data. On Unix disc this equates to 108Meg worth of data. The structure of this data is as shown below:&lt;/P&gt;

&lt;P&gt;C,2444384447, 2444384447,383333135115,00383333135115,44,380,20121119213215000000,20121119225657410000,5082410&lt;BR /&gt;
C,1444861393, 1444861393,1255553202,01233333202,44,44,20121119215011000000,20121119225324010000,3793010&lt;BR /&gt;
C,2444761741, 2444761741,18999922048,0018999922048,44,1876,20121119215041000000,20121119225044000000,3603000&lt;BR /&gt;
C,2344413095, 2344413095,2366668501,02344444501,44,44,20121119220837000000,20121119223846340000,1809340&lt;BR /&gt;
C,2044401174, 2044401174,9057777030,09066660030,44,44,20121119221700000000,20121119221959060000,179060&lt;/P&gt;

&lt;P&gt;However when I examine the size of the index after this load the index has grown by 433Meg in size and displays an event count of 990,019&lt;/P&gt;

&lt;P&gt;This clearly does not demonstrate a compression.&lt;/P&gt;

&lt;P&gt;Any ideas on the theory of compression or on what I might have done wrong.&lt;/P&gt;</description>
    <pubDate>Fri, 21 Nov 2014 09:27:47 GMT</pubDate>
    <dc:creator>garryclarke</dc:creator>
    <dc:date>2014-11-21T09:27:47Z</dc:date>
    <item>
      <title>How does compression work and what should I expect to see in volume of data as it is stored in an index in SPLUNK 6.1.2 and 6.2?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-does-compression-work-and-what-should-I-expect-to-see-in/m-p/129142#M26507</link>
      <description>&lt;P&gt;I am trying to understand what I should expect to see regarding the volume of data I ingest into SPLUNK and its volume as it is stored in a SPLUNK index. Some of the articles I have been reading would suggest that I should see up to a 50% compression in size.&lt;/P&gt;

&lt;P&gt;I have ingested into a SPLUNK 6.1.2 and 6.2 instance the following data:&lt;/P&gt;

&lt;P&gt;959 files which in total contains 990978 rows of data. On Unix disc this equates to 108Meg worth of data. The structure of this data is as shown below:&lt;/P&gt;

&lt;P&gt;C,2444384447, 2444384447,383333135115,00383333135115,44,380,20121119213215000000,20121119225657410000,5082410&lt;BR /&gt;
C,1444861393, 1444861393,1255553202,01233333202,44,44,20121119215011000000,20121119225324010000,3793010&lt;BR /&gt;
C,2444761741, 2444761741,18999922048,0018999922048,44,1876,20121119215041000000,20121119225044000000,3603000&lt;BR /&gt;
C,2344413095, 2344413095,2366668501,02344444501,44,44,20121119220837000000,20121119223846340000,1809340&lt;BR /&gt;
C,2044401174, 2044401174,9057777030,09066660030,44,44,20121119221700000000,20121119221959060000,179060&lt;/P&gt;

&lt;P&gt;However when I examine the size of the index after this load the index has grown by 433Meg in size and displays an event count of 990,019&lt;/P&gt;

&lt;P&gt;This clearly does not demonstrate a compression.&lt;/P&gt;

&lt;P&gt;Any ideas on the theory of compression or on what I might have done wrong.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Nov 2014 09:27:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-does-compression-work-and-what-should-I-expect-to-see-in/m-p/129142#M26507</guid>
      <dc:creator>garryclarke</dc:creator>
      <dc:date>2014-11-21T09:27:47Z</dc:date>
    </item>
    <item>
      <title>Re: How does compression work and what should I expect to see in volume of data as it is stored in an index in SPLUNK 6.1.2 and 6.2?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-does-compression-work-and-what-should-I-expect-to-see-in/m-p/129143#M26508</link>
      <description>&lt;P&gt;Index size on disk has three main components.&lt;/P&gt;

&lt;P&gt;Compressed raw data - depending on your data, that might be 10-15% of the indexed volume.&lt;BR /&gt;
Index structures - depending on your data, that might be 25-150% of the indexed volume.&lt;BR /&gt;
Acceleration summaries - depending on your data and the accelerations you're using (report, datamodel), that might add a few percent on top.&lt;/P&gt;

&lt;P&gt;In the wild I've seen anything from &amp;lt;10% to &amp;gt;200% disk-to-raw ratio, it really depends on your data.&lt;/P&gt;

&lt;P&gt;To inspect your own indexes quickly, you can use a search like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| dbinspect index=* | stats sum(rawSize) as rawSize sum(sizeOnDiskMB) as sizeOnDiskMB by index | eval rawSize = rawSize / 1048576 | eval ratio = sizeOnDiskMB / rawSize
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In the long run, consider using Fire Brigade to monitor your indexes: &lt;A href="https://apps.splunk.com/app/1632/"&gt;https://apps.splunk.com/app/1632/&lt;/A&gt; along with &lt;A href="https://apps.splunk.com/app/1633/"&gt;https://apps.splunk.com/app/1633/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;What kind of files are you ingesting? Any special settings being used, such as a lot of indexed fields? Silly question, are you indexing archive files?&lt;/P&gt;</description>
      <pubDate>Sat, 22 Nov 2014 18:40:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-does-compression-work-and-what-should-I-expect-to-see-in/m-p/129143#M26508</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-11-22T18:40:10Z</dc:date>
    </item>
  </channel>
</rss>

