Splunk Search

loading a zip file for lookup?

smileyge
Path Finder

I have a ~250MB csv file I want to use in a lookup, it takes forever when I do the search to get it into memory so I want to try to zip it and see if that helps or hurts. Was at Splunk>Live yesterday and this was a suggested approach by a couple folks. Problem: when I zip the file and try to add a new lookup I get an error that says file is binary and not gzipped. I've tried windows compression, GNU gzip, gnu gzip with unix style endlines in the CSV, with extension .gz, .zip, .csv.gz, .csv.zip, always get same error. File is ~50MB compressed. Any suggestions?

I see on the page where you pick the file the little help thing even talks about loading a zip so I don't understand why this isn't working.

Tags (1)
0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma

ChrisG
Splunk Employee
Splunk Employee

You should check your CSV for any special characters, beyond the line endings. See this previous Answers posting.

0 Karma

ChrisG
Splunk Employee
Splunk Employee

That is...odd. Uploading a compressed CSV file should work fine. Just to troubleshoot the basics: can you uncompress the file and open it successfully? I'm assuming you've confirmed that there's nothing wrong with the compression itself, but want to confirm. And I don't have any other real ideas at the moment. 😕

0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...