Getting Data In

Db-Sources-Why am I getting CSV Duplicated Records?

paoli28
Observer

Hi! I'm starting with Splunk, so i really appreciate some help cause i've been stucked several weeks.

I have a CSV file which its source is DB2, when i search in splunk the same query as in DB2, i can see i'm getting duplicated information in splunk. Example: in DB2 my query is select * from table where field=value and in splunk i'm doing  ((index="index1")(sourcetype="csv")(source="file.csv"))
 | where field="value" | table field1 field2 field3 field4  Does anyone know what is happening or how can i solve this? I really don't want to use dedup because i may not be able to see how the data is changing after day.

Labels (2)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Has your file.csv been indexed more than once? You could do something like this to see if the duplicates have different index times.

((index="index1")(sourcetype="csv")(source="file.csv"))
| where field="value" 
| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S")
| table indextime field1 field2 field3 field4
0 Karma

paoli28
Observer

Thanks for the reply. Yes, the duplicates have different indexed time. Does this index change if I recreate my file each day? Because every time i extract from the DB the file is recreated with the new information, i mean i'm recreating the file not append on it. How can i change that indexed time? 

Sorry if i'm asking something super basic, i am new in all of these and thanks for the help.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, you can't change the index time. Consider it like this, if the file you are monitoring changes, it will get indexed from the point of change. Normally, with log files, this is fine since the log events are written to the tail of the log file, or at a significant time change e.g. new day or new hour, a new file is created, either with a new name, of the existing one is renamed and a new file is started with the same name. Splunk will look for these sorts of difference and add the new events. When you overwrite the file, as you are doing in your case, Splunk treats it as a new (log) file and starts indexing from the beginning of the file, hence the re-indexing of "duplicate" events. Rather than writing your csv file to an index, you could consider using the outputlookup command to write the values to a lookup store.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

What Is Splunk? Here’s What You Can Do with Splunk

Hey Splunk Community, we know you know Splunk. You likely leverage its unparalleled ability to ingest, index, ...

Level Up Your .conf25: Splunk Arcade Comes to Boston

With .conf25 right around the corner in Boston, there’s a lot to look forward to — inspiring keynotes, ...

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Although it might seem daunting, as we’ve seen in this series, manual instrumentation can be straightforward ...