Splunk Search

loading a zip file for lookup?

smileyge
Path Finder

I have a ~250MB csv file I want to use in a lookup, it takes forever when I do the search to get it into memory so I want to try to zip it and see if that helps or hurts. Was at Splunk>Live yesterday and this was a suggested approach by a couple folks. Problem: when I zip the file and try to add a new lookup I get an error that says file is binary and not gzipped. I've tried windows compression, GNU gzip, gnu gzip with unix style endlines in the CSV, with extension .gz, .zip, .csv.gz, .csv.zip, always get same error. File is ~50MB compressed. Any suggestions?

I see on the page where you pick the file the little help thing even talks about loading a zip so I don't understand why this isn't working.

Tags (1)
0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma

ChrisG
Splunk Employee
Splunk Employee

You should check your CSV for any special characters, beyond the line endings. See this previous Answers posting.

0 Karma

ChrisG
Splunk Employee
Splunk Employee

That is...odd. Uploading a compressed CSV file should work fine. Just to troubleshoot the basics: can you uncompress the file and open it successfully? I'm assuming you've confirmed that there's nothing wrong with the compression itself, but want to confirm. And I don't have any other real ideas at the moment. 😕

0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...