Getting Data In

Why are lines in a gzip'ed CSV showing up twice in searches?

estepgi
New Member

Hi.

Just installed Splunk for the first time today. As a tes,t I took a CSV file and indexed it, and it worked fine. Then I created a new file in CSV format and gzip'ed it.

test.csv.gz
field,val
blah,whatever

It indexed fine. I then edited the file using vi, adding in a new line :
newfield,morestuff

I then and then searched the results again. Now the "newfield,morestuff" shows up once in the results, but "blah,whatever" shows up twice. I tried adding more lines and saw the same pattern - the most recent line shows up once, but the older lines are duplicated in the search results.

I then added | dedup _raw to the search and the duplicates went away. However, I'm looking for a more elegant solution.

By the way, I also tried unzipping the file, editing it, then gzipping it again, with the same results.

Thanks for your help!

0 Karma

kbarker302
Communicator

It sounds like you were using the "upload" method of adding data to Splunk, which will result in the duplicates the way you've described it. A better way would be to have Splunk monitor your CSV file for changes (Add Data - Monitor - Files & Directories.) That way, you can make as many changes as you want to your CSV file without having to re-upload it, and Splunk will only detect and index any changes you've made.

0 Karma

estepgi
New Member

Thanks for the response. Actually I was already using the "continuously monitor" option that you recommend. I definitely don't want to re-upload my files. As I said, this does work well for plaintext csv files but it leads to duplication for gzipp'ed csv files.

0 Karma
Get Updates on the Splunk Community!

Splunk Training for All: Meet Aspiring Cybersecurity Analyst, Marc Alicea

Splunk Education believes in the value of training and certification in today’s rapidly-changing data-driven ...

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...