Getting Data In

How to exclude duplicate Data while onboaring the data in below scenerio

vikram1583
Explorer

I have a python script with runs daily and saves output in csv file 

for example: if i run that script  today it will get the data from april 1st to today date(04/21/2021) and if i run tomorrow it will get the data from april 1st to tomorrow date (04/22/2021) and with different file name every time we run 

i want to onboard this data into splunk with out duplicate data 

how can we do that? 

we have a field name called start_time   this field we are taking as time field 

for example: start_time field value = 04/21/2021 10.30

example: start_time field value = 04/22/2021 10.30

 

Thanks in advance

 

Labels (1)
0 Karma

venkatasri
Influencer

Hi,

Then Splunk avoids re-indexing duplicate data which is built-in, have you configured the monitors then share inputs.conf and sample data files.

 

0 Karma

venkatasri
Influencer

Hi @vikram1583 

How the data looks like in both files they change every time script runs? 

Instead index both files and remove duplicates using Splunk commands like - dedup, dc etc... depends on your use case.

----------------------------------------------

An upvote would be appreciated if it helps!

0 Karma

vikram1583
Explorer

Hi @venkatasri  thanks for your response.  its not about only 2 files i will run that script every day if i inject those files everyday license usage will increase so i just want to inject new data 

0 Karma

vikram1583
Explorer

data will be same for previous dates it just adds new data for current date 

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!