I have a RHEL admin who is building two syslog servers to ingest data from one RHEL node redundantly. These two syslog servers will forward their data to two Network File Storage (NFS) mounts. To meet security requirements, we need to forward the data from both of those NFS mounts to our indexers. I am worried about indexing duplicate data and using twice as much license as needed for that data. Is there a way to index the data from one NFS at a time and if it crashes/fails, have a forwarder ready to automatically cutover and continue sending data to be indexed? Please advise.
Splunk on its own does not have any deduplication functionality. And two different sources are... well, just two independent sources so there is no built-in kind of input which would treat them as one.
You could try implementing your own modular/scripted input which would keep track of the state of each of the NFS-mounted files but that would mean you'd need to reimplement the monitor input with extra steps.
I'm also not sure what you mean by "two syslog servers to ingest data from RHEL node".
What is it you're trying to achieve here that cannot be achieved with - for example - useACK and sufficiently big permanent queue?
Rick, as of right now, I don't have any more information about the mysterious two syslog servers. An admin reached out with the scenario, and I couldn't think of any way to prevent duplication from a situation like this. My mind went straight to a custom-scripted input as you described, but I wanted to know if there was a simpler solution. Once the admin sets up what they've described, and I have a more realistic view of the project, I can update this with better specifics. Thank you for the advice.
If the goal is redundancy, then the typical solution is to put a load balancer in front of two syslog servers and let each syslog server forward data as it is received.
Thank you, I'll certainly consider doing that. I'll have to get a little more familiar with load balancers.
Unfortunately, that introduces another SPOF in the form of said LB.
syslog is a very simple solution (won't use the word "protocol" because "syslog" can mean many things) and was never meant to be very robust. Paraphrasing some well-known sayings - "R" in "syslog" stands for reliability.
Splunk on its own does not have any deduplication functionality. And two different sources are... well, just two independent sources so there is no built-in kind of input which would treat them as one.
You could try implementing your own modular/scripted input which would keep track of the state of each of the NFS-mounted files but that would mean you'd need to reimplement the monitor input with extra steps.
I'm also not sure what you mean by "two syslog servers to ingest data from RHEL node".
What is it you're trying to achieve here that cannot be achieved with - for example - useACK and sufficiently big permanent queue?