Getting Data In

How to index big zip files?

dillencehsu
Path Finder

I have few zip file (after extend is thound of csv files) in a folder, each zip file size is over 1GB.

I use monitor stanza monitor this folder but Splunk did not index these zip file.

 

Splunk 7.3.3 Standalone

 

[monitor://D:\zipfolder]

index =my_index

sourcetype = my_sourcetype

crcSalt = <SOURCE>

 

 

Any suggests ?

Thanks.

Labels (1)
Tags (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Unzip the files.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Splunk doesn't index the compressed files directly. It has to uncompress them into a temporary directory first.

That's probably where it is failing (do you have enough space for uncompression? Text files typically compress quite well, so it's not uncommon to need about ten times as much free space as the archive size for unpacking the archive).

To make sure your ingestion is going properly, just uncompress the files on your own before ingestion.

vinoth_raj
Path Finder

Guys, any thoughts about this?

0 Karma

dillencehsu
Path Finder

Unzip these files or use batch mode (batch://.../*.zip) to input these zip files.

If you have a lot of mount zip files, write a shell scrip to unzip these zip files.  

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Unzip the files.

---
If this reply helps you, Karma would be appreciated.

dillencehsu
Path Finder

Yes. Finally, I use shell script unzip thousands of zip files.

I want to know what happened on Splunk.

ArchiveProcess  can not unzip big zip file or unzip for big zip file is take a long time, and Splunk skip it ?
It have any limit for zip file when indexing ?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I don't know why Splunk wouldn't read the zip files.  Perhaps, as you suggest, they're too big.  Is there anything in the logs about it?

---
If this reply helps you, Karma would be appreciated.

dillencehsu
Path Finder

The same logs with each zip file.

 

12-15-2020 14:17:28.450 +0900 INFO ArchiveProcessor - Handling file=D:\logfile.zip
12-15-2020 14:17:28.450 +0900 INFO ArchiveProcessor - reading path=D:\logfile.zip (seek=0 len=1153047505)
12-15-2020 14:17:59.761 +0900 INFO ArchiveProcessor - Finished processing file 'D:\logfile.zip', removing from stats

0 Karma
Get Updates on the Splunk Community!

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

At Cisco, purpose isn’t a tagline—it’s a commitment. Cisco’s FY25 Purpose Report outlines how the company is ...

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join us for a live Demo Day at the Cisco Store on January 21st 10:00am - 11:00am PST In the fast-paced world ...