Getting Data In

Why are lines in a gzip'ed CSV showing up twice in searches?

estepgi
New Member

Hi.

Just installed Splunk for the first time today. As a tes,t I took a CSV file and indexed it, and it worked fine. Then I created a new file in CSV format and gzip'ed it.

test.csv.gz
field,val
blah,whatever

It indexed fine. I then edited the file using vi, adding in a new line :
newfield,morestuff

I then and then searched the results again. Now the "newfield,morestuff" shows up once in the results, but "blah,whatever" shows up twice. I tried adding more lines and saw the same pattern - the most recent line shows up once, but the older lines are duplicated in the search results.

I then added | dedup _raw to the search and the duplicates went away. However, I'm looking for a more elegant solution.

By the way, I also tried unzipping the file, editing it, then gzipping it again, with the same results.

Thanks for your help!

0 Karma

kbarker302
Communicator

It sounds like you were using the "upload" method of adding data to Splunk, which will result in the duplicates the way you've described it. A better way would be to have Splunk monitor your CSV file for changes (Add Data - Monitor - Files & Directories.) That way, you can make as many changes as you want to your CSV file without having to re-upload it, and Splunk will only detect and index any changes you've made.

0 Karma

estepgi
New Member

Thanks for the response. Actually I was already using the "continuously monitor" option that you recommend. I definitely don't want to re-upload my files. As I said, this does work well for plaintext csv files but it leads to duplication for gzipp'ed csv files.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...