Getting Data In

How does compression work and what should I expect to see in volume of data as it is stored in an index in SPLUNK 6.1.2 and 6.2?

garryclarke
Path Finder

I am trying to understand what I should expect to see regarding the volume of data I ingest into SPLUNK and its volume as it is stored in a SPLUNK index. Some of the articles I have been reading would suggest that I should see up to a 50% compression in size.

I have ingested into a SPLUNK 6.1.2 and 6.2 instance the following data:

959 files which in total contains 990978 rows of data. On Unix disc this equates to 108Meg worth of data. The structure of this data is as shown below:

C,2444384447, 2444384447,383333135115,00383333135115,44,380,20121119213215000000,20121119225657410000,5082410
C,1444861393, 1444861393,1255553202,01233333202,44,44,20121119215011000000,20121119225324010000,3793010
C,2444761741, 2444761741,18999922048,0018999922048,44,1876,20121119215041000000,20121119225044000000,3603000
C,2344413095, 2344413095,2366668501,02344444501,44,44,20121119220837000000,20121119223846340000,1809340
C,2044401174, 2044401174,9057777030,09066660030,44,44,20121119221700000000,20121119221959060000,179060

However when I examine the size of the index after this load the index has grown by 433Meg in size and displays an event count of 990,019

This clearly does not demonstrate a compression.

Any ideas on the theory of compression or on what I might have done wrong.

Tags (3)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Index size on disk has three main components.

Compressed raw data - depending on your data, that might be 10-15% of the indexed volume.
Index structures - depending on your data, that might be 25-150% of the indexed volume.
Acceleration summaries - depending on your data and the accelerations you're using (report, datamodel), that might add a few percent on top.

In the wild I've seen anything from <10% to >200% disk-to-raw ratio, it really depends on your data.

To inspect your own indexes quickly, you can use a search like this:

| dbinspect index=* | stats sum(rawSize) as rawSize sum(sizeOnDiskMB) as sizeOnDiskMB by index | eval rawSize = rawSize / 1048576 | eval ratio = sizeOnDiskMB / rawSize

In the long run, consider using Fire Brigade to monitor your indexes: https://apps.splunk.com/app/1632/ along with https://apps.splunk.com/app/1633/

What kind of files are you ingesting? Any special settings being used, such as a lot of indexed fields? Silly question, are you indexing archive files?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...