Splunk Search

How to remove duplicate events in INDEX , not on Search ?

jadengoho
Builder

I do have many data including duplicate data , and i want to remove duplicate data from the index , without using the ""DEDUP" command since it only remove the event on SEARCH not in INDEX , can somebody help me ?

Tags (1)
0 Karma

niketn
Legend

@jadengoho, are these duplicates old or your data will keep on having duplicate data in future as well? If there will be duplicates, what is the source/cause/frequency of duplicate data?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

jadengoho
Builder

it is a daily logs data , so duplicate data is a problem , cause they are just stacking .

0 Karma

niketn
Legend

If you can fix data while ingestion that would be best. Else you can run a daily scheduled search (to run after data is ingested), which will list all daily data with dedup and push it to separate index.

Refer to Splunk Documentation: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect#Moving_events_to_a_diffe...

PS:
You can use collect command to do this, however, to me seems overhead unless fixed prior to indexing.
You can also think of scripted input to do this in case there are no other means of preventing duplicated events from being indexed.
Using collect command if you define sourcetype other than stash, it will count against your license.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

mjlsnombrado
Communicator

I have the same problem, do I need to use a script to fix this issue? If yes, what kind of script should I use?

0 Karma

nickhills
Ultra Champion

You will need to create a search which finds your duplicated data, and returns all but the last copy (or first - depending on your needs).
Once you are happy your search correctly identifies ONLY the duplicated events you can pipe the results to |delete which will remove the data from the indexes.

You will need to be a user with 'can delete' permissions - no user has this be default (not even admin) so you may need to add this capability to your user first - its also a good idea to remove this capability when you have finished to prevent accidents! (been there)

Its worth noting that this will not remove the data from disk - it simply marks it as deleted in the buckets, so it wont be returned in future searches

If my comment helps, please give it a thumbs up!
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...