Splunk Search

Is it possible to dedup data during indexing?

tylr
Engager

I'm feeding splunk a large quantity of historical gzipped syslog files for many, many different machines through a single TCP listener input. These archived files almost certainly contain overlapping data. Furthermore, new data may come in that overlaps with the old data. I can filter my search results to not show that duplicated data, but is it possible to strip any duplicate lines at index time?

Tags (2)
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

No, that is not possible.

View solution in original post

ncsantucci
Path Finder

Similar scenario with logrotate compressing and rotating logs see http://answers.splunk.com/answers/121267/how-does-splunk-handle-nix-logrotate-based-log-rotation

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

No, that is not possible.

Get Updates on the Splunk Community!

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...