Db-Sources-Why am I getting CSV Duplicated Records...

paoli28 · ‎11-29-2022

Hi! I'm starting with Splunk, so i really appreciate some help cause i've been stucked several weeks.

I have a CSV file which its source is DB2, when i search in splunk the same query as in DB2, i can see i'm getting duplicated information in splunk. Example: in DB2 my query is select * from table where field=value and in splunk i'm doing ((index="index1")(sourcetype="csv")(source="file.csv"))
| where field="value" | table field1 field2 field3 field4 Does anyone know what is happening or how can i solve this? I really don't want to use dedup because i may not be able to see how the data is changing after day.

ITWhisperer · ‎11-29-2022

Has your file.csv been indexed more than once? You could do something like this to see if the duplicates have different index times.

((index="index1")(sourcetype="csv")(source="file.csv"))
| where field="value" 
| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S")
| table indextime field1 field2 field3 field4

paoli28 · ‎11-29-2022

Thanks for the reply. Yes, the duplicates have different indexed time. Does this index change if I recreate my file each day? Because every time i extract from the DB the file is recreated with the new information, i mean i'm recreating the file not append on it. How can i change that indexed time?

Sorry if i'm asking something super basic, i am new in all of these and thanks for the help.

ITWhisperer · ‎11-30-2022

Essentially, you can't change the index time. Consider it like this, if the file you are monitoring changes, it will get indexed from the point of change. Normally, with log files, this is fine since the log events are written to the tail of the log file, or at a significant time change e.g. new day or new hour, a new file is created, either with a new name, of the existing one is renamed and a new file is started with the same name. Splunk will look for these sorts of difference and add the new events. When you overwrite the file, as you are doing in your case, Splunk treats it as a new (log) file and starts indexing from the beginning of the file, hence the re-indexing of "duplicate" events. Rather than writing your csv file to an index, you could consider using the outputlookup command to write the values to a lookup store.

Db-Sources-Why am I getting CSV Duplicated Records?

CSV

source

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Quick connection discovery mode for forwarders

Build and Launch AI Agents from Your Splunk Workflows

Splunk Cloud Application Management in Terraform

Join the Conversation