Getting Data In

Db-Sources-Why am I getting CSV Duplicated Records?

paoli28
Observer

Hi! I'm starting with Splunk, so i really appreciate some help cause i've been stucked several weeks.

I have a CSV file which its source is DB2, when i search in splunk the same query as in DB2, i can see i'm getting duplicated information in splunk. Example: in DB2 my query is select * from table where field=value and in splunk i'm doing  ((index="index1")(sourcetype="csv")(source="file.csv"))
 | where field="value" | table field1 field2 field3 field4  Does anyone know what is happening or how can i solve this? I really don't want to use dedup because i may not be able to see how the data is changing after day.

Labels (2)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Has your file.csv been indexed more than once? You could do something like this to see if the duplicates have different index times.

((index="index1")(sourcetype="csv")(source="file.csv"))
| where field="value" 
| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S")
| table indextime field1 field2 field3 field4
0 Karma

paoli28
Observer

Thanks for the reply. Yes, the duplicates have different indexed time. Does this index change if I recreate my file each day? Because every time i extract from the DB the file is recreated with the new information, i mean i'm recreating the file not append on it. How can i change that indexed time? 

Sorry if i'm asking something super basic, i am new in all of these and thanks for the help.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Essentially, you can't change the index time. Consider it like this, if the file you are monitoring changes, it will get indexed from the point of change. Normally, with log files, this is fine since the log events are written to the tail of the log file, or at a significant time change e.g. new day or new hour, a new file is created, either with a new name, of the existing one is renamed and a new file is started with the same name. Splunk will look for these sorts of difference and add the new events. When you overwrite the file, as you are doing in your case, Splunk treats it as a new (log) file and starts indexing from the beginning of the file, hence the re-indexing of "duplicate" events. Rather than writing your csv file to an index, you could consider using the outputlookup command to write the values to a lookup store.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...