First of all, I need ask a question because I don't have enough karma points for upload an app.
I had some little problems with input script in Splunk which created duplicate event data logs in indexes. This is a reason because I wrote python script to remove that duplicate events.
Script works for me! Enjoy! 😉
Hi - I saw the app you uploaded and approved it. It will appear in a few minutes.
Sounds like we need to improve the process for uploading apps 🙂
Worked well, thanks.
It can take a long time to run if your data is large, some time restrictions may be useful.
To generalize to any duplicate (not only in a short time range) I modified the search1 line 59
search1 = 'search index=' + index + ' | transaction _raw keepevicted=true | where eventcount>1'
Unfortunately this script is only correct if you only have a maximum of one valid event per second in the targetted index. In other words, any other events in that second are removed by the script, even if they are not duplicates.
Using dedup on just the _raw data could remove events that are not duplicates. You would need to include the other indexed fields to ensure uniqueness. For example, if you have an error message being logged multiple times and then used dedup just on the _raw data, you would only see one occurrence of that error in Splunk. Including _time would help, but only for that server, so you would also need to include host. Then you run into issues if there are two of the same errors on the same host at the same time, etc. ad infinitum.
Yep, already been through that, I did realize however that splunk was indexing some files more than once which was the main cause of the problem.
I stopped using dedup after that.