Uploaded File size: 717MB
Current Index size: 811MB ( settings -> Data -> Indexes )
Index Size: 0.79 GB ( Monitoring Console -> Indexing -> Indexes and Volumes -> Index Detail Instance -> overview )
The index size created should be less than the file size, but it is larger than file uploaded.
Previously, when I uploaded the same file combining with another file of 14 MB, the Index size was 706 MB, whereas now it is opposite. Size should have been compressed.
Can anybody please explain this?
Thanks and Regards
Did you clean out the index between uploads? If not, the index now contains multiple copies of the uploaded file which might explain why it's bigger than the source.
Per @richgalloway, please clarify what exactly you did to the index between uploads. For example, using the
delete SPL command does not actually remove the data from the index.
Try to upload it to a new index and compare. It could be possible that some data was leftover from your previous uploads, so uploading to new index will ensure that won't happen.
Hi, I deleted the previous index and cleaned the system. Then tried uploading the file. It's showing the correct word count as "3465010" on linux box as well as on Splunk.
But index size is 945 MB and file size is 731 MB.
I am not able to understand how can this be possible.
The size of the index on disk depends on several factors. It is entirely possible for the index to consume more space than the incoming file does. When Splunk indexes a file, it creates one or more buckets in the index. Each bucket contains two main kinds of files:
"rawdata" = the incoming data, plus timestamp, host, source and sourcetype, stored in a journaled, compressed file. The "rawdata" is compressed via gzip, so it generally equals about 15% of the inbound data size. However, this depends on how well the incoming data compresses.
index files = the keyword index, the bloom filters, metadata files, and various other index files. The size of these files is highly dependent on the number of unique keywords in the incoming data; indexed field extractions also increase the size of the index files. The size of these files can vary widely, but generally falls between 10% - 110% of the incoming data size.
If the size of the index changed between uploads, perhaps someone created indexed field extractions.