Developing for Splunk Enterprise
Highlighted

How to remove duplicate event data from index?

Path Finder

First of all, I need ask a question because I don't have enough karma points for upload an app.

I had some little problems with input script in Splunk which created duplicate event data logs in indexes. This is a reason because I wrote python script to remove that duplicate events.

Script works for me! Enjoy! 😉

Highlighted

Re: How to remove duplicate event data from index?

Splunk Employee
Splunk Employee

Wait - the system isn't supposed to prevent you from uploading an app. That's a bug 😞

Let me get on that.

0 Karma
Highlighted

Re: How to remove duplicate event data from index?

Path Finder

I logout - login again and then I could upload an app but when I uploaded my app disappeared 😞

I will try tomorrow!

0 Karma
Highlighted

Re: How to remove duplicate event data from index?

Splunk Employee
Splunk Employee

Hi - I saw the app you uploaded and approved it. It will appear in a few minutes.

Sounds like we need to improve the process for uploading apps 🙂

Highlighted

Re: How to remove duplicate event data from index?

Path Finder

Thanks for approving! 🙂

0 Karma
Highlighted

Re: How to remove duplicate event data from index?

Splunk Employee
Splunk Employee

Worked well, thanks.
It can take a long time to run if your data is large, some time restrictions may be useful.

To generalize to any duplicate (not only in a short time range) I modified the search1 line 59

search1 = 'search index=' + index + ' | transaction _raw keepevicted=true | where eventcount>1'

0 Karma
Highlighted

Re: How to remove duplicate event data from index?

Splunk Employee
Splunk Employee

Unfortunately this script is only correct if you only have a maximum of one valid event per second in the targetted index. In other words, any other events in that second are removed by the script, even if they are not duplicates.

0 Karma
Highlighted

Re: How to remove duplicate event data from index?

Builder

You could probably use the dedup command at search time: | dedup _raw

This will remove all duplicate data from your index

Dedup Command

0 Karma
Highlighted

Re: How to remove duplicate event data from index?

Path Finder

Using dedup on just the _raw data could remove events that are not duplicates. You would need to include the other indexed fields to ensure uniqueness. For example, if you have an error message being logged multiple times and then used dedup just on the _raw data, you would only see one occurrence of that error in Splunk. Including _time would help, but only for that server, so you would also need to include host. Then you run into issues if there are two of the same errors on the same host at the same time, etc. ad infinitum.

0 Karma
Highlighted

Re: How to remove duplicate event data from index?

Builder

Yep, already been through that, I did realize however that splunk was indexing some files more than once which was the main cause of the problem.

I stopped using dedup after that.

Thanks 🙂

0 Karma