Getting Data In

Why are lines in a gzip'ed CSV showing up twice in searches?

estepgi
New Member

Hi.

Just installed Splunk for the first time today. As a tes,t I took a CSV file and indexed it, and it worked fine. Then I created a new file in CSV format and gzip'ed it.

test.csv.gz
field,val
blah,whatever

It indexed fine. I then edited the file using vi, adding in a new line :
newfield,morestuff

I then and then searched the results again. Now the "newfield,morestuff" shows up once in the results, but "blah,whatever" shows up twice. I tried adding more lines and saw the same pattern - the most recent line shows up once, but the older lines are duplicated in the search results.

I then added | dedup _raw to the search and the duplicates went away. However, I'm looking for a more elegant solution.

By the way, I also tried unzipping the file, editing it, then gzipping it again, with the same results.

Thanks for your help!

0 Karma

kbarker302
Communicator

It sounds like you were using the "upload" method of adding data to Splunk, which will result in the duplicates the way you've described it. A better way would be to have Splunk monitor your CSV file for changes (Add Data - Monitor - Files & Directories.) That way, you can make as many changes as you want to your CSV file without having to re-upload it, and Splunk will only detect and index any changes you've made.

0 Karma

estepgi
New Member

Thanks for the response. Actually I was already using the "continuously monitor" option that you recommend. I definitely don't want to re-upload my files. As I said, this does work well for plaintext csv files but it leads to duplication for gzipp'ed csv files.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...