Getting Data In

Splunk fails to monitor zip file

sdwilkerson
Contributor

Hello,

Trying to have Splunk monitor standard scan-reports from Foundstone (Vulnerability Assessment Scanner), but repeatedly seeing this in the splunkd.log:

11-22-2011 17:13:26.759 -0500 ERROR ArchiveFile - In archive '/data/splunk/splunk-4.2.4/var/spool/splunk/Monthly-Full-2010-102811.csv.zip': Bad ZIP file

This zip file opens fine on the windows system with the built-in zip, and on linux with "unzip."

  • Any ideas what is causing the problem?
  • Is it possible that Foundstone uses a compression algorithm that Splunk doesn't understand and if so, how can we test for this?
  • Any idea on how to get around it besides a scripted input?

Thanks,
Sean

0 Karma
1 Solution

sdwilkerson
Contributor

Answering my own question.

The problem we found with Foundstone, is that it saves the CSV report in a hierarchical directory structure with windows style backslash characters to note new directories. This is normally ok, but I believe that the Foundstone zipping function inserts the first directory in some strange way where Linux/python interpret it as a regular backslash character and not a directory.

You can see with the linux unzip command the file is not corrupt, but the resulting contents look funny:

sean@ubuntu:/tmp/temp$ unzip -lvt Monthly-Full-2010-102811.csv.zip 
Archive:  Monthly-Full-2010-102811.csv.zip
    testing: 18\CSV/en/authenticated_hosts.csv   OK
    testing: 18\CSV/en/csvmanifest.xml   OK
    testing: 18\CSV/en/network_assets.csv   OK
    testing: 18\CSV/en/vulnerabilities.csv   OK
No errors detected in compressed data of Monthly-Full-2010-102811.csv.zip.

I believe that Splunk's monitoring process is doing some input validation and getting stuck on this backslash character.

The way I found to get around this issue, is to write a small wrapper to unzip the file in advance then have Splunk eat the files inside.

I found no output options in the Foundstone management UI that could control this behavior.

Best,

Sean

View solution in original post

sdwilkerson
Contributor

With Foundstone or some other application?

0 Karma

sdwilkerson
Contributor

Answering my own question.

The problem we found with Foundstone, is that it saves the CSV report in a hierarchical directory structure with windows style backslash characters to note new directories. This is normally ok, but I believe that the Foundstone zipping function inserts the first directory in some strange way where Linux/python interpret it as a regular backslash character and not a directory.

You can see with the linux unzip command the file is not corrupt, but the resulting contents look funny:

sean@ubuntu:/tmp/temp$ unzip -lvt Monthly-Full-2010-102811.csv.zip 
Archive:  Monthly-Full-2010-102811.csv.zip
    testing: 18\CSV/en/authenticated_hosts.csv   OK
    testing: 18\CSV/en/csvmanifest.xml   OK
    testing: 18\CSV/en/network_assets.csv   OK
    testing: 18\CSV/en/vulnerabilities.csv   OK
No errors detected in compressed data of Monthly-Full-2010-102811.csv.zip.

I believe that Splunk's monitoring process is doing some input validation and getting stuck on this backslash character.

The way I found to get around this issue, is to write a small wrapper to unzip the file in advance then have Splunk eat the files inside.

I found no output options in the Foundstone management UI that could control this behavior.

Best,

Sean

hartfoml
Motivator

Great this is exactly what I needed. If it's not too much trouble can you post the unzip code you used. Thanks ever so much. I am using Founstone too and want to get the scan data directly without the operator having to uncompress the reports.

0 Karma

hartfoml
Motivator

I am haveing the same issue. Did you ever find a salution?

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and ...

This is the second post in our Splunk Observability Cloud’s AI Assistant in Action series, in which we look at ...

Elevate Your Organization with Splunk’s Next Platform Evolution

 Thursday, July 10, 2025  |  11AM PDT / 2PM EDT Whether you're managing complex deployments or looking to ...

Splunk Answers Content Calendar, June Edition

Get ready for this week’s post dedicated to Splunk Dashboards! We're celebrating the power of community by ...