Getting Data In

Does Splunk index gzip files?

hulahoop
Splunk Employee
Splunk Employee

I'd like to index a directory of 50,000 gzip files. The files range in size from 1 KB to 5 MB. Can Splunk monitor these files without first unpacking the gzips?

1 Solution

hulahoop
Splunk Employee
Splunk Employee

The good news is "YES, Splunk can index gzip files as is!" The bad news is, Splunk will monitor these files one at a time, instead of in parallel. Because it is not possible to predict the uncompressed size of a gzip file, Splunk processes these files in sequence for better control of disk allocation. With respect to performance, this is not ideal for handling 50k files so please consider uncompresing them before having Splunk monitor them to take advantage of Splunk's multi-threading file monitoring capabilities.

View solution in original post

hurricanelabs
Path Finder

jrodman
Splunk Employee
Splunk Employee

I think we're actually a bit slower than uncompressing the files first (unsure of details) but it's not far off. Mostly uncompressing that much data with the zlib algorithm just takes a lot of CPU.

hulahoop
Splunk Employee
Splunk Employee

The good news is "YES, Splunk can index gzip files as is!" The bad news is, Splunk will monitor these files one at a time, instead of in parallel. Because it is not possible to predict the uncompressed size of a gzip file, Splunk processes these files in sequence for better control of disk allocation. With respect to performance, this is not ideal for handling 50k files so please consider uncompresing them before having Splunk monitor them to take advantage of Splunk's multi-threading file monitoring capabilities.

paulmarino
New Member

So, what if you don't want it to read a compressed file? Can you compress using a file extension that will prevent Splunk from attempting to index the file... e.g. rotated and compressed log files?

0 Karma

lguinn2
Legend

You can blacklist compressed files in inputs.conf so that they will be ignored:

[monitor:///var/log]
blacklist=(tgz$|zip$)

will ignore all files in /var/log that end with "tgz" or "zip"

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...