I am trying to understand what I should expect to see regarding the volume of data I ingest into SPLUNK and its volume as it is stored in a SPLUNK index. Some of the articles I have been reading would suggest that I should see up to a 50% compression in size.
I have ingested into a SPLUNK 6.1.2 and 6.2 instance the following data:
959 files which in total contains 990978 rows of data. On Unix disc this equates to 108Meg worth of data. The structure of this data is as shown below:
Compressed raw data - depending on your data, that might be 10-15% of the indexed volume.
Index structures - depending on your data, that might be 25-150% of the indexed volume.
Acceleration summaries - depending on your data and the accelerations you're using (report, datamodel), that might add a few percent on top.
In the wild I've seen anything from <10% to >200% disk-to-raw ratio, it really depends on your data.
To inspect your own indexes quickly, you can use a search like this:
| dbinspect index=* | stats sum(rawSize) as rawSize sum(sizeOnDiskMB) as sizeOnDiskMB by index | eval rawSize = rawSize / 1048576 | eval ratio = sizeOnDiskMB / rawSize