Solved: How is the Index Size greater than the uncompresse...

Deepali529 · ‎12-09-2016

Uploaded File size: 717MB
Current Index size: 811MB ( settings -> Data -> Indexes )
Index Size: 0.79 GB ( Monitoring Console -> Indexing -> Indexes and Volumes -> Index Detail Instance -> overview )

The index size created should be less than the file size, but it is larger than file uploaded.
Previously, when I uploaded the same file combining with another file of 14 MB, the Index size was 706 MB, whereas now it is opposite. Size should have been compressed.

Can anybody please explain this?

Thanks and Regards

lguinn2 · ‎12-11-2016

The size of the index on disk depends on several factors. It is entirely possible for the index to consume more space than the incoming file does. When Splunk indexes a file, it creates one or more buckets in the index. Each bucket contains two main kinds of files:

"rawdata" = the incoming data, plus timestamp, host, source and sourcetype, stored in a journaled, compressed file. The "rawdata" is compressed via gzip, so it generally equals about 15% of the inbound data size. However, this depends on how well the incoming data compresses.
index files = the keyword index, the bloom filters, metadata files, and various other index files. The size of these files is highly dependent on the number of unique keywords in the incoming data; indexed field extractions also increase the size of the index files. The size of these files can vary widely, but generally falls between 10% - 110% of the incoming data size.

If the size of the index changed between uploads, perhaps someone created indexed field extractions.

View solution in original post

lguinn2 · ‎12-11-2016

The size of the index on disk depends on several factors. It is entirely possible for the index to consume more space than the incoming file does. When Splunk indexes a file, it creates one or more buckets in the index. Each bucket contains two main kinds of files:

"rawdata" = the incoming data, plus timestamp, host, source and sourcetype, stored in a journaled, compressed file. The "rawdata" is compressed via gzip, so it generally equals about 15% of the inbound data size. However, this depends on how well the incoming data compresses.
index files = the keyword index, the bloom filters, metadata files, and various other index files. The size of these files is highly dependent on the number of unique keywords in the incoming data; indexed field extractions also increase the size of the index files. The size of these files can vary widely, but generally falls between 10% - 110% of the incoming data size.

If the size of the index changed between uploads, perhaps someone created indexed field extractions.

richgalloway · ‎12-09-2016

Did you clean out the index between uploads? If not, the index now contains multiple copies of the uploaded file which might explain why it's bigger than the source.

---
If this reply helps you, Karma would be appreciated.

rjthibod · ‎12-09-2016

Per @richgalloway, please clarify what exactly you did to the index between uploads. For example, using the delete SPL command does not actually remove the data from the index.

Deepali529 · ‎12-09-2016

Hi, I have just uploaded the file like usually we do. No command I have used

somesoni2 · ‎12-09-2016

Try to upload it to a new index and compare. It could be possible that some data was leftover from your previous uploads, so uploading to new index will ensure that won't happen.

Deepali529 · ‎12-11-2016

Hi, I deleted the previous index and cleaned the system. Then tried uploading the file. It's showing the correct word count as "3465010" on linux box as well as on Splunk.
But index size is 945 MB and file size is 731 MB.
I am not able to understand how can this be possible.

puneethgowda · ‎12-11-2016

what is the file type csv or text file ?

Deepali529 · ‎12-09-2016

Hi, there is only one file present in index.

File size:717 MB
Index size: 811 MB

How is the Index Size greater than the uncompressed raw data size?

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR

Are you a member of the Splunk Community?

How is the Index Size greater than the uncompressed raw data size?

Index This | When is October more than just the tenth month?

Observe and Secure All Apps with Splunk

What’s New & Next in Splunk SOAR