Splunk Dev

How to remove duplicate event data from index?

zpavic
Path Finder

First of all, I need ask a question because I don't have enough karma points for upload an app.

I had some little problems with input script in Splunk which created duplicate event data logs in indexes. This is a reason because I wrote python script to remove that duplicate events.

Script works for me! Enjoy! 😉

Dark_Ichigo
Builder

You could probably use the dedup command at search time: | dedup _raw

This will remove all duplicate data from your index

Dedup Command

0 Karma

jmorais
Explorer

I downvoted this post because dedup dont remove from index, remove from search...

0 Karma

Dark_Ichigo
Builder

Yep, already been through that, I did realize however that splunk was indexing some files more than once which was the main cause of the problem.

I stopped using dedup after that.

Thanks 🙂

0 Karma

richprescott
Path Finder

Using dedup on just the _raw data could remove events that are not duplicates. You would need to include the other indexed fields to ensure uniqueness. For example, if you have an error message being logged multiple times and then used dedup just on the _raw data, you would only see one occurrence of that error in Splunk. Including _time would help, but only for that server, so you would also need to include host. Then you run into issues if there are two of the same errors on the same host at the same time, etc. ad infinitum.

0 Karma

jrodman
Splunk Employee
Splunk Employee

Unfortunately this script is only correct if you only have a maximum of one valid event per second in the targetted index. In other words, any other events in that second are removed by the script, even if they are not duplicates.

0 Karma

John_Mark
Splunk Employee
Splunk Employee

Wait - the system isn't supposed to prevent you from uploading an app. That's a bug 😞

Let me get on that.

0 Karma

yannK
Splunk Employee
Splunk Employee

Worked well, thanks.
It can take a long time to run if your data is large, some time restrictions may be useful.

To generalize to any duplicate (not only in a short time range) I modified the search1 line 59

search1 = 'search index=' + index + ' | transaction _raw keepevicted=true | where eventcount>1'

0 Karma

zpavic
Path Finder

Thanks for approving! 🙂

0 Karma

John_Mark
Splunk Employee
Splunk Employee

Hi - I saw the app you uploaded and approved it. It will appear in a few minutes.

Sounds like we need to improve the process for uploading apps 🙂

zpavic
Path Finder

I logout - login again and then I could upload an app but when I uploaded my app disappeared 😞

I will try tomorrow!

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...