We have an application which runs on 2 servers, 1 is the active server and one is a hot standby so if one server fails the other automatically picks up, we can also force it to fail over as part of normal maintenance tasks.
The problem is, the application generates logs on the currently active server, but periodically the log directory in synchronized so that we have a full set of history on both machines to make sure if one ever goes down catastrophically we can recover.
Setting up a Splunk Universal Forwarder on each of the machines will send 2 copies of the logs to Splunk.
Is there some method people have used to stop ingesting duplicate log files/entries from what is essentially 2 separate systems?
The best that you can do is to schedule a search like this to run every hour for the last hour to
delete the duplicates. It does not save you license but should speed up your searches and confuse people less:
| streamstats count AS _serial BY _raw | search _serial>1 | delete
There is no way to do this in Splunk pre-indexing. Via search you could do a dedup on the messages. One think you could do is copy to the other server under a different file name, and then index this file also. Then at least your host and source will be different for the sourcetype. So being copied from Active (host1) to Standby (host2): host=host2 source=mainlog.log-copy_from_active.
The other question to this would be, if you have Splunk on both the Active and Standby server, then why do you need to copy the logs around? Splunk will ingest these on both as events are generated, and then in Splunk Search you can see these messages, by source and host.
I've been using dedup, but was hoping there was a way to no index it to begin with, as the log files are identical and add no value to the index.
The application also uses the log files internally for it's users to query in the native environment. If the files aren't synchronised between the servers then they will get different results depending on which is the current active server. Either can be active at any one time.
Again, since these are both different systems, why don you just ingest (use SplunkUF with a monitor) on each host.
The logs will appear from two distinct hosts, and you can search based on that. E.g...
index=notgoodloggingsystem host=maybeactivehost1 host=maybeactivehost2 source="c:\mycrappy logs\logfile.log"
If you do this, there is no need to copy logs between hosts and worry about event duplication.
Unfortunately the logs primary function is within the application, they are used by the users of the application and so need to be synchronised across both machines. So which ever machine is active there is a complete list available to the user. So they must be replicated across the 2 systems.
But yes completely agree if the logs did not need to replicated across both systems this would not be an issues.
Unfortunately right now log files are the only option, they have discussed being able to forward logs to other systems but as of right now that requires recompiling dll's and a few other things, and it would only end up in SQL server which would then need a license for plus would also introduce a further delay.
I think for now dedup or mark the duplicate records as deleted and later on hopefully they will add the syslog option. It would be the ideal scenario and should be relatively easy from a coder perspective.
I'm sorry, I'm not sure what you mean exactly. But the application that generates the logs has no concept of what syslog is, it can only write to a file which is then rolled over once per day (usually) can be more often.
Meh, worth a shot. Many application are able to use syslog to both send to remote host and to write to disk..If you know syslog is not an option for remote logging here, then I guess the quest continues....