Getting Data In

Db-Sources-Why am I getting CSV Duplicated Records?

paoli28
Observer

Hi! I'm starting with Splunk, so i really appreciate some help cause i've been stucked several weeks.

I have a CSV file which its source is DB2, when i search in splunk the same query as in DB2, i can see i'm getting duplicated information in splunk. Example: in DB2 my query is select * from table where field=value and in splunk i'm doing  ((index="index1")(sourcetype="csv")(source="file.csv"))
 | where field="value" | table field1 field2 field3 field4  Does anyone know what is happening or how can i solve this? I really don't want to use dedup because i may not be able to see how the data is changing after day.

Labels (2)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Has your file.csv been indexed more than once? You could do something like this to see if the duplicates have different index times.

((index="index1")(sourcetype="csv")(source="file.csv"))
| where field="value" 
| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S")
| table indextime field1 field2 field3 field4
0 Karma

paoli28
Observer

Thanks for the reply. Yes, the duplicates have different indexed time. Does this index change if I recreate my file each day? Because every time i extract from the DB the file is recreated with the new information, i mean i'm recreating the file not append on it. How can i change that indexed time? 

Sorry if i'm asking something super basic, i am new in all of these and thanks for the help.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, you can't change the index time. Consider it like this, if the file you are monitoring changes, it will get indexed from the point of change. Normally, with log files, this is fine since the log events are written to the tail of the log file, or at a significant time change e.g. new day or new hour, a new file is created, either with a new name, of the existing one is renamed and a new file is started with the same name. Splunk will look for these sorts of difference and add the new events. When you overwrite the file, as you are doing in your case, Splunk treats it as a new (log) file and starts indexing from the beginning of the file, hence the re-indexing of "duplicate" events. Rather than writing your csv file to an index, you could consider using the outputlookup command to write the values to a lookup store.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

March Community Office Hours Security Series Uncovered!

Hello Splunk Community! In March, Splunk Community Office Hours spotlighted our fabulous Splunk Threat ...

Stay Connected: Your Guide to April Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars in April. This post ...