Knowledge Management

Import data without duplicates

alucarddjin
Path Finder

I have a missing set of data. I've been given a new set of data to fill the gaps but there are some duplicates in the raw file to what is already in splunk and I need a way to import the non duplicate data.

So far I've managed to import the new data into a separate index and used a query to remove the items that are already in the main index then tired the collect command to put the values into another index (I've used a dummy one to start with so I don't mess up my main index). However when the data is copied it messes up some of the date formats (turns them to epoch) and doesn't pick up the _time field correctly.

Current code:

    (index=main sourcetype="sourcetype1") OR (index=sourceindex" ) 
    | eventstats count by deviceCustomDate1 fileName 
    | search count=1
    | collect index=sourceindex sourcetype=sourcetype1

Then if I look at the results for one of the record in both indexs I get this

  _time               deviceCustomDate1           index
  2019-05-16 23:47:21   2019/05/16 22:47:21 UTC     sourceindex
  2019-05-17 00:03:29   1558046841000                 destinationindex

Am I missing something? Is Collect the right tool to use?

Thanks in advance.

0 Karma
Get Updates on the Splunk Community!

March Community Office Hours Security Series Uncovered!

Hello Splunk Community! In March, Splunk Community Office Hours spotlighted our fabulous Splunk Threat ...

Stay Connected: Your Guide to April Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars in April. This post ...

Want to Reduce Costs, Mitigate Risk, Improve Performance, or Increase Efficiencies? ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...