Splunk Search

loading a zip file for lookup?

smileyge
Path Finder

I have a ~250MB csv file I want to use in a lookup, it takes forever when I do the search to get it into memory so I want to try to zip it and see if that helps or hurts. Was at Splunk>Live yesterday and this was a suggested approach by a couple folks. Problem: when I zip the file and try to add a new lookup I get an error that says file is binary and not gzipped. I've tried windows compression, GNU gzip, gnu gzip with unix style endlines in the CSV, with extension .gz, .zip, .csv.gz, .csv.zip, always get same error. File is ~50MB compressed. Any suggestions?

I see on the page where you pick the file the little help thing even talks about loading a zip so I don't understand why this isn't working.

Tags (1)
0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma

ChrisG
Splunk Employee
Splunk Employee

You should check your CSV for any special characters, beyond the line endings. See this previous Answers posting.

0 Karma

ChrisG
Splunk Employee
Splunk Employee

That is...odd. Uploading a compressed CSV file should work fine. Just to troubleshoot the basics: can you uncompress the file and open it successfully? I'm assuming you've confirmed that there's nothing wrong with the compression itself, but want to confirm. And I don't have any other real ideas at the moment. 😕

0 Karma

smileyge
Path Finder

According to that post [^x00-x7F] is the regex to find a "special character" in the file. That finds commas. I have millions of those - it's a compressed CSV file. The file imports fine when I don't compress it, but when I compress I get the error. Am I spending good time after bad? Can anyone comment on even the merits of trying to use a .gz file in a lookup vs. an uncompressed CSV for performance reasons?

0 Karma
Get Updates on the Splunk Community!

Splunk Platform | Upgrading your Splunk Deployment to Python 3.9

Splunk initially announced the removal of Python 2 during the release of Splunk Enterprise 8.0.0, aiming to ...

From Product Design to User Insights: Boosting App Developer Identity on Splunkbase

co-authored by Yiyun Zhu & Dan Hosaka Engaging with the Community at .conf24 At .conf24, we revitalized the ...

Detect and Resolve Issues in a Kubernetes Environment

We’ve gone through common problems one can encounter in a Kubernetes environment, their impacts, and the ...