Getting Data In

Is it possible to dedup events before they are indexed?

nijjie
Engager

Using

index=ets2  source="my_source" | eval id=_cd."|".index."|".splunk_server | transaction _raw maxspan=1s keepevicted=true mvlist=t | search eventcount>1 | eval delete_id=mvindex(id, 1, -1) | stats count by delete_id | fields - count

I have approx. 500,000 events in 24 hrs that are duplicates. I would like to dedup prior to indexing. Is this possible?

0 Karma

somesoni2
Revered Legend

I don't think Splunk can identify/remove duplicate during indexing. The options would to remove duplicate at the source which is generating the log or pre-process the log after is generated and before it's indexed.

0 Karma

hartfoml
Motivator

interesting question as to why the system is writing duplicate logs or are the time stamps different on each of the logs. this could be a case where the system is writing the same log _id every time it finds it but with different time stamps. It's not like a machine to make a mistake but rather the programmer could tell the machine to write the logs in this unusual fashion.

0 Karma
Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

Splunk Decoded: Business Transactions vs Business IQ

It’s the morning of Black Friday, and your e-commerce site is handling 10x normal traffic. Orders are flowing, ...

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...