Getting Data In
Highlighted

How is the Index Size greater than the uncompressed raw data size?

Explorer

Uploaded File size: 717MB
Current Index size: 811MB ( settings -> Data -> Indexes )
Index Size: 0.79 GB ( Monitoring Console -> Indexing -> Indexes and Volumes -> Index Detail Instance -> overview )

The index size created should be less than the file size, but it is larger than file uploaded.
Previously, when I uploaded the same file combining with another file of 14 MB, the Index size was 706 MB, whereas now it is opposite. Size should have been compressed.

Can anybody please explain this?

Thanks and Regards

0 Karma
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

SplunkTrust
SplunkTrust

Did you clean out the index between uploads? If not, the index now contains multiple copies of the uploaded file which might explain why it's bigger than the source.

---
If this reply helps you, an upvote would be appreciated.
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

Explorer

Hi, there is only one file present in index.

File size:717 MB
Index size: 811 MB

0 Karma
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

Champion

Per @richgalloway, please clarify what exactly you did to the index between uploads. For example, using the delete SPL command does not actually remove the data from the index.

0 Karma
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

Explorer

Hi, I have just uploaded the file like usually we do. No command I have used

0 Karma
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

SplunkTrust
SplunkTrust

Try to upload it to a new index and compare. It could be possible that some data was leftover from your previous uploads, so uploading to new index will ensure that won't happen.

0 Karma
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

Explorer

Hi, I deleted the previous index and cleaned the system. Then tried uploading the file. It's showing the correct word count as "3465010" on linux box as well as on Splunk.
But index size is 945 MB and file size is 731 MB.
I am not able to understand how can this be possible.

0 Karma
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

Communicator

what is the file type csv or text file ?

0 Karma
Highlighted

Re: How is the Index Size greater than the uncompressed raw data size?

Legend

The size of the index on disk depends on several factors. It is entirely possible for the index to consume more space than the incoming file does. When Splunk indexes a file, it creates one or more buckets in the index. Each bucket contains two main kinds of files:

  • "rawdata" = the incoming data, plus timestamp, host, source and sourcetype, stored in a journaled, compressed file. The "rawdata" is compressed via gzip, so it generally equals about 15% of the inbound data size. However, this depends on how well the incoming data compresses.

  • index files = the keyword index, the bloom filters, metadata files, and various other index files. The size of these files is highly dependent on the number of unique keywords in the incoming data; indexed field extractions also increase the size of the index files. The size of these files can vary widely, but generally falls between 10% - 110% of the incoming data size.

If the size of the index changed between uploads, perhaps someone created indexed field extractions.

View solution in original post