Getting Data In

How are identical files from multiple (clustered) systems handled?

afx
Contributor

Hi,
I have an application that logs to a shared clustered file system.
What happens when I install the fowarder (via deployment server and identical configuation) on on each of the nodes to monitor the logs on the this file system?
Do I get duplicates for each of the hosts or can splunk identify that they are dupes even though they come from different hosts?
Would crcsalt help here?
thx
afx

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The tracking of duplicate input files is done by the individual forwarders. Since each forwarder does not know what other forwarders have processed, you will get duplicates.

---
If this reply helps you, Karma would be appreciated.
0 Karma

afx
Contributor

Drat...
Two ideas:
1: Forcing an identical hostname, would that help the indexer to identify incoming dupes?
2: Using a heavy forwarder inbetween to filter out dupes.
I really want to avoid #2, that would mean I either add additional burden to a box or need a new box.
thx
afx

0 Karma

richgalloway
SplunkTrust
SplunkTrust
  1. Indexers do not identify dupes. You can do that at search time, however.
  2. An intermediate HF could probably do the time, but it would be a bottleneck and would impair performance. Splunk advises against intermediate forwarders unless absolutely necessary.

What you really should do is avoid having more than one forwarder read a given file.

---
If this reply helps you, Karma would be appreciated.
0 Karma

afx
Contributor

Yup, avaoiding that would be best. I am currently trying to figure out whether the forwarder can be startet / stopped with the application, so there might be some minimal overlap, but overall only one of them is active.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Painting a Clearer Picture: Creating Cross-Domain Visibility with AI Canvas

    Thursday, June 25, 2026  |  11AM PDT / 2PM EDT  Duration: 1 Hour (Includes live Q&A) Register to ...

Analytics Workspace deprecation

As of Splunk Cloud Platform 10.4.2604 and Splunk Enterprise 10.4, Analytics Workspace is now deprecated. ...

Splunk Developer Day Recap: Building, Publishing, and Growing on the Splunk Platform

Splunk Developer Day brought the Splunk developer community together for a practical look at what it means to ...