Getting Data In

Avoiding double ingestion

Jtorge
Explorer

I have a RHEL admin who is building two syslog servers to ingest data from one RHEL node redundantly. These two syslog servers will forward their data to two Network File Storage (NFS) mounts. To meet security requirements, we need to forward the data from both of those NFS mounts to our indexers. I am worried about indexing duplicate data and using twice as much license as needed for that data. Is there a way to index the data from one NFS at a time and if it crashes/fails, have a forwarder ready to automatically cutover and continue sending data to be indexed? Please advise.

 

Labels (3)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Splunk on its own does not have any deduplication functionality. And two different sources are... well, just two independent sources so there is no built-in kind of input which would treat them as one.

You could try implementing your own modular/scripted input which would keep track of the state of each of the NFS-mounted files but that would mean you'd need to reimplement the monitor input with extra steps.

I'm also not sure what you mean by "two syslog servers to ingest data from RHEL node".

What is it you're trying to achieve here that cannot be achieved with - for example - useACK and sufficiently big permanent queue?

View solution in original post

Jtorge
Explorer

Rick, as of right now, I don't have any more information about the mysterious two syslog servers. An admin reached out with the scenario, and I couldn't think of any way to prevent duplication from a situation like this. My mind went straight to a custom-scripted input as you described, but I wanted to know if there was a simpler solution. Once the admin sets up what they've described, and I have a more realistic view of the project, I can update this with better specifics. Thank you for the advice. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If the goal is redundancy, then the typical solution is to put a load balancer in front of two syslog servers and let each syslog server forward data as it is received.

---
If this reply helps you, Karma would be appreciated.

Jtorge
Explorer

Thank you, I'll certainly consider doing that. I'll have to get a little more familiar with load balancers. 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Unfortunately, that introduces another SPOF in the form of said LB.

syslog is a very simple solution (won't use the word "protocol" because "syslog" can mean many things) and was never meant to be very robust. Paraphrasing some well-known sayings - "R" in "syslog" stands for reliability.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Splunk on its own does not have any deduplication functionality. And two different sources are... well, just two independent sources so there is no built-in kind of input which would treat them as one.

You could try implementing your own modular/scripted input which would keep track of the state of each of the NFS-mounted files but that would mean you'd need to reimplement the monitor input with extra steps.

I'm also not sure what you mean by "two syslog servers to ingest data from RHEL node".

What is it you're trying to achieve here that cannot be achieved with - for example - useACK and sufficiently big permanent queue?

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...