Solved: Is it possible to dedup data during indexing?

tylr · ‎02-19-2011

I'm feeding splunk a large quantity of historical gzipped syslog files for many, many different machines through a single TCP listener input. These archived files almost certainly contain overlapping data. Furthermore, new data may come in that overlaps with the old data. I can filter my search results to not show that duplicated data, but is it possible to strip any duplicate lines at index time?

Stephen_Sorkin · ‎02-19-2011

No, that is not possible.

View solution in original post

ncsantucci · ‎05-23-2014

Similar scenario with logrotate compressing and rotating logs see http://answers.splunk.com/answers/121267/how-does-splunk-handle-nix-logrotate-based-log-rotation

Stephen_Sorkin · ‎02-19-2011

No, that is not possible.

Is it possible to dedup data during indexing?

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!

Are you a member of the Splunk Community?

Is it possible to dedup data during indexing?

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!