Is it possible to collect logs from Active/Standby...

tonyparreiro · ‎03-20-2017

Hello,

We have an application which runs on 2 servers, 1 is the active server and one is a hot standby so if one server fails the other automatically picks up, we can also force it to fail over as part of normal maintenance tasks.

The problem is, the application generates logs on the currently active server, but periodically the log directory in synchronized so that we have a full set of history on both machines to make sure if one ever goes down catastrophically we can recover.

Setting up a Splunk Universal Forwarder on each of the machines will send 2 copies of the logs to Splunk.

Is there some method people have used to stop ingesting duplicate log files/entries from what is essentially 2 separate systems?

Thanks,
Tony

woodcock · ‎03-24-2017

The best that you can do is to schedule a search like this to run every hour for the last hour to delete the duplicates. It does not save you license but should speed up your searches and confuse people less:

| streamstats count AS _serial BY _raw | search _serial>1 | delete

tonyparreiro · ‎03-26-2017

Don't really like the idea of using up double the license, but looks like that might have to be the way.

esix_splunk · ‎03-21-2017

There is no way to do this in Splunk pre-indexing. Via search you could do a dedup on the messages. One think you could do is copy to the other server under a different file name, and then index this file also. Then at least your host and source will be different for the sourcetype. So being copied from Active (host1) to Standby (host2): host=host2 source=mainlog.log-copy_from_active.

The other question to this would be, if you have Splunk on both the Active and Standby server, then why do you need to copy the logs around? Splunk will ingest these on both as events are generated, and then in Splunk Search you can see these messages, by source and host.

tonyparreiro · ‎03-21-2017

I've been using dedup, but was hoping there was a way to no index it to begin with, as the log files are identical and add no value to the index.

The application also uses the log files internally for it's users to query in the native environment. If the files aren't synchronised between the servers then they will get different results depending on which is the current active server. Either can be active at any one time.

esix_splunk · ‎03-26-2017

Again, since these are both different systems, why don you just ingest (use SplunkUF with a monitor) on each host.

The logs will appear from two distinct hosts, and you can search based on that. E.g...

index=notgoodloggingsystem host=maybeactivehost1 host=maybeactivehost2  source="c:\mycrappy logs\logfile.log"

If you do this, there is no need to copy logs between hosts and worry about event duplication.

tonyparreiro · ‎03-27-2017

Unfortunately the logs primary function is within the application, they are used by the users of the application and so need to be synchronised across both machines. So which ever machine is active there is a complete list available to the user. So they must be replicated across the 2 systems.

But yes completely agree if the logs did not need to replicated across both systems this would not be an issues.

esix_splunk · ‎03-28-2017

Is there any logging out mechanism aside from this log file? Something you could send out to HEC endpoint? Sounds like a long shot...

tonyparreiro · ‎03-28-2017

Unfortunately right now log files are the only option, they have discussed being able to forward logs to other systems but as of right now that requires recompiling dll's and a few other things, and it would only end up in SQL server which would then need a license for plus would also introduce a further delay.

I think for now dedup or mark the duplicate records as deleted and later on hopefully they will add the syslog option. It would be the ideal scenario and should be relatively easy from a coder perspective.

mattymo · ‎03-24-2017

Can you just use syslog? Then instead of getting mixed up in this sync process, you just catch a stream from the boxes and you don't have to worry about who is active and who isn't?

- MattyMo

tonyparreiro · ‎03-26-2017

I'm sorry, I'm not sure what you mean exactly. But the application that generates the logs has no concept of what syslog is, it can only write to a file which is then rolled over once per day (usually) can be more often.

mattymo · ‎03-27-2017

Meh, worth a shot. Many application are able to use syslog to both send to remote host and to write to disk..If you know syslog is not an option for remote logging here, then I guess the quest continues....

- MattyMo

tonyparreiro · ‎03-28-2017

Sadly this app is not of of those that knows what syslog is.

There is scope for having the vendor add it, down the road but this will take some time.

thanks,

mattymo · ‎03-28-2017

sad panda. unfortunately it sounds like dedup is the easiest option here...

- MattyMo

tonyparreiro · ‎03-28-2017

I think so, for now at least.

Is it possible to collect logs from Active/Standby application server pair without log duplication?

Can’t make it to .conf25? Join us online!

Level Up Your .conf25: Splunk Arcade Comes to Boston

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Are you a member of the Splunk Community?

Is it possible to collect logs from Active/Standby application server pair without log duplication?

Can’t make it to .conf25? Join us online!

Level Up Your .conf25: Splunk Arcade Comes to Boston

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform