Getting Data In

Db-Sources-Why am I getting CSV Duplicated Records?

paoli28
Observer

Hi! I'm starting with Splunk, so i really appreciate some help cause i've been stucked several weeks.

I have a CSV file which its source is DB2, when i search in splunk the same query as in DB2, i can see i'm getting duplicated information in splunk. Example: in DB2 my query is select * from table where field=value and in splunk i'm doing  ((index="index1")(sourcetype="csv")(source="file.csv"))
 | where field="value" | table field1 field2 field3 field4  Does anyone know what is happening or how can i solve this? I really don't want to use dedup because i may not be able to see how the data is changing after day.

Labels (2)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Has your file.csv been indexed more than once? You could do something like this to see if the duplicates have different index times.

((index="index1")(sourcetype="csv")(source="file.csv"))
| where field="value" 
| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S")
| table indextime field1 field2 field3 field4
0 Karma

paoli28
Observer

Thanks for the reply. Yes, the duplicates have different indexed time. Does this index change if I recreate my file each day? Because every time i extract from the DB the file is recreated with the new information, i mean i'm recreating the file not append on it. How can i change that indexed time? 

Sorry if i'm asking something super basic, i am new in all of these and thanks for the help.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, you can't change the index time. Consider it like this, if the file you are monitoring changes, it will get indexed from the point of change. Normally, with log files, this is fine since the log events are written to the tail of the log file, or at a significant time change e.g. new day or new hour, a new file is created, either with a new name, of the existing one is renamed and a new file is started with the same name. Splunk will look for these sorts of difference and add the new events. When you overwrite the file, as you are doing in your case, Splunk treats it as a new (log) file and starts indexing from the beginning of the file, hence the re-indexing of "duplicate" events. Rather than writing your csv file to an index, you could consider using the outputlookup command to write the values to a lookup store.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Introduction to Splunk AI

How are you using AI in Splunk? Whether you see AI as a threat or opportunity, AI is here to stay. Lucky for ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Maximizing the Value of Splunk ES 8.x

Splunk Enterprise Security (ES) continues to be a leader in the Gartner Magic Quadrant, reflecting its pivotal ...