Splunk Dev

How to remove duplicate event data from index?

zpavic
Path Finder

First of all, I need ask a question because I don't have enough karma points for upload an app.

I had some little problems with input script in Splunk which created duplicate event data logs in indexes. This is a reason because I wrote python script to remove that duplicate events.

Script works for me! Enjoy! 😉

Dark_Ichigo
Builder

You could probably use the dedup command at search time: | dedup _raw

This will remove all duplicate data from your index

Dedup Command

0 Karma

jmorais
Explorer

I downvoted this post because dedup dont remove from index, remove from search...

0 Karma

Dark_Ichigo
Builder

Yep, already been through that, I did realize however that splunk was indexing some files more than once which was the main cause of the problem.

I stopped using dedup after that.

Thanks 🙂

0 Karma

richprescott
Path Finder

Using dedup on just the _raw data could remove events that are not duplicates. You would need to include the other indexed fields to ensure uniqueness. For example, if you have an error message being logged multiple times and then used dedup just on the _raw data, you would only see one occurrence of that error in Splunk. Including _time would help, but only for that server, so you would also need to include host. Then you run into issues if there are two of the same errors on the same host at the same time, etc. ad infinitum.

0 Karma

jrodman
Splunk Employee
Splunk Employee

Unfortunately this script is only correct if you only have a maximum of one valid event per second in the targetted index. In other words, any other events in that second are removed by the script, even if they are not duplicates.

0 Karma

John_Mark
Splunk Employee
Splunk Employee

Wait - the system isn't supposed to prevent you from uploading an app. That's a bug 😞

Let me get on that.

0 Karma

yannK
Splunk Employee
Splunk Employee

Worked well, thanks.
It can take a long time to run if your data is large, some time restrictions may be useful.

To generalize to any duplicate (not only in a short time range) I modified the search1 line 59

search1 = 'search index=' + index + ' | transaction _raw keepevicted=true | where eventcount>1'

0 Karma

zpavic
Path Finder

Thanks for approving! 🙂

0 Karma

John_Mark
Splunk Employee
Splunk Employee

Hi - I saw the app you uploaded and approved it. It will appear in a few minutes.

Sounds like we need to improve the process for uploading apps 🙂

zpavic
Path Finder

I logout - login again and then I could upload an app but when I uploaded my app disappeared 😞

I will try tomorrow!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...