Deployment Architecture

How does Splunk estimate the total raw data size?

Explorer

I had just setup Splunk with indexer clustering (RF-3, SF-2) with no data and initially loaded 1TB of syslog file using oneshot. The "Index Detail: Deployment" page showed that the total index size is 1121GB whereas the total raw data size (uncompressed) as 1783GB and hence the Raw to Index Size Ratio at 1.59:1.

My question is how is it possible for 1024GB (1TB) file to be treated as 1783GB?

0 Karma

SplunkTrust
SplunkTrust

The index size doesn't only depends upon the uncompressed raw data size. The Splunk create a compressed raw data files, as well as, a set of index files to make it searchable. The index consists of both these type of files. The compression ratio of raw data files and size of index files depends upon various factor. For more information, see following documentation. (see 2nd link for example of how Splunk calculates space).

http://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/HowSplunkstoresindexes
http://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/Systemrequirements#Storage_requirement_exa...

0 Karma

Explorer

Sorry, if I was not clear. I am not asking about the index size. I do understand the sizing calculations on rawdata (RF) and tsidx (SF) in a clustered indexer mode. My question is specifically on the page "Index Detail: Deployment" page which shows the following information under "Index Structure Overview" (in 6.5.1).

  8 (Indexers)        1121GB (Total Index Size)     1783GB (Total Raw Data size (uncompressed))                                          1.59:1 (Raw to Index Size Ratio)

My question is specifically on how Splunk measures the "Total Raw Data size (uncompressed)" as I just ingested a 1024GB syslog file and I was hoping to see that as the total raw data size and not 1783GB.

0 Karma

Explorer

You can see the screenshot here.

alt text

0 Karma