loading a zip file for lookup?

smileyge · ‎11-15-2013

I have a ~250MB csv file I want to use in a lookup, it takes forever when I do the search to get it into memory so I want to try to zip it and see if that helps or hurts. Was at Splunk>Live yesterday and this was a suggested approach by a couple folks. Problem: when I zip the file and try to add a new lookup I get an error that says file is binary and not gzipped. I've tried windows compression, GNU gzip, gnu gzip with unix style endlines in the CSV, with extension .gz, .zip, .csv.gz, .csv.zip, always get same error. File is ~50MB compressed. Any suggestions?

I see on the page where you pick the file the little help thing even talks about loading a zip so I don't understand why this isn't working.

smileyge · ‎11-15-2013

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

ChrisG · ‎11-15-2013

You should check your CSV for any special characters, beyond the line endings. See this previous Answers posting.

ChrisG · ‎11-15-2013

That is...odd. Uploading a compressed CSV file should work fine. Just to troubleshoot the basics: can you uncompress the file and open it successfully? I'm assuming you've confirmed that there's nothing wrong with the compression itself, but want to confirm. And I don't have any other real ideas at the moment. 😕

smileyge · ‎11-15-2013

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

loading a zip file for lookup?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?