I have a situation where I need to collect logs from an application that sits on clustered servers with a drive that moves with the Cluster, logs are stored on this "Active Drive".
I have read some issues where by Splunk doesn't pick up the drive when a node becomes active and this is resolved by ensuring that Splunk services is restarted as part of the fail-over. So I'm OK with this part of the problem.
My current concerns are more around indexing duplicate data.
As each Host has it's own copy of Splunk UF and it's own fishbucket, I'd suspect that when fail-over occurs, the active host will commence indexing from the last point it knows about, but many of these logs will likely have been indexed by the alternate host while it was active.
Just wondering if there is a way around this?
Can Splunk be installed on the "Active Drive" and move with the fail-overs thereby maintaining one copy of fishbucket for the clustered instance?
You are correct that the UFs will maintain their own fishbuckets. I can't speak to best practice but in my own experience the best way to deal with logs on shared directors was to have a dedicated forwarder reading the logs. That way the forwarder is no longer part of the applications' infrastructure, so maintenance to your clustered application will not disrupt the splunk forwarder.
Thanks jplum.
for your suggestion above - when you say a dedicated forwarder, do you mean one separate to the either of the clustered instances?
How do you manage the connection to the moving drive from this instance to get the logs?
Yes I mean a separate Splunk instance.
In terms on managing the connection, if I understand you correctly this "Active Drive" is using NFS or something. In that case the dedicated host simply has that drive mounted at all times.
Is that how your application works? Or is there some other technology in play.