Getting Data In

How to avoid same log with same content ingestion in splunk from different servers

narmadak
Engager

Hello,

I have 10 servers for same purpose. If one server is down others will be active so that no loss of business continuity. 

We have ABC.log generates across all the servers with same content. We need to add all the 10 servers in serverclass.conf and we did  the same. But we are getting ABC.log to splunk multiple times I.e., 5 to 6 times or one event repeating 5 to 6 times. 

I appreciate any help to avoid mutiple ingestion of same log from different servers or avoid duplicate events. 

Added crcSalt in inputs.conf, but not working. 

Thanks 

Labels (2)
Tags (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The crcSalt setting helps only with a single monitored file.  Splunk has no way of knowing whether the data from several different servers is duplicated or not.  For all it knows, the same event hit all of the servers at about the same time.

The workaround is to remove duplicates at search time.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

ldongradi_SPL
Splunk Employee
Splunk Employee

I would see only one way to do this. And it's ugly. ...

Mount the 10 file systems on NFS from the 10 servers on one single UF. 

That way, the fishbucket will consider treat the file as unique/identical on the 10 different paths, and it will be indexed only once.

richgalloway
SplunkTrust
SplunkTrust

The crcSalt setting helps only with a single monitored file.  Splunk has no way of knowing whether the data from several different servers is duplicated or not.  For all it knows, the same event hit all of the servers at about the same time.

The workaround is to remove duplicates at search time.

---
If this reply helps you, Karma would be appreciated.

PickleRick
SplunkTrust
SplunkTrust

Unfortunately, this does not solve possible license consumption issues.

It's a very unusual case and there is no ready-made solution for this (at least none that I know of). Splunk on its own does not implement deduplication on inputs. Also, you have to remember that "the same" event could be ingested from different sources/forwarders and get sent to other peers in a cluster. Splunk would have no way of knowing that the event is duplicated.

A slightly ugly solution would be to create your own modular input reading events from all those sources and performjng deduplication. But that would mean creating a SPOF in your infrastructure since you'd have to have a single collection point.

Another possibility (albeit ugly as hell and even more license-consuming that the original one) would be to ingest the events initially into a temporary index and then periodically deduplicate them in search time collecting them into a destination index. Ugh.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...