Getting Data In

Is it possible to collect logs from Active/Standby application server pair without log duplication?

tonyparreiro
Explorer

Hello,

We have an application which runs on 2 servers, 1 is the active server and one is a hot standby so if one server fails the other automatically picks up, we can also force it to fail over as part of normal maintenance tasks.

The problem is, the application generates logs on the currently active server, but periodically the log directory in synchronized so that we have a full set of history on both machines to make sure if one ever goes down catastrophically we can recover.

Setting up a Splunk Universal Forwarder on each of the machines will send 2 copies of the logs to Splunk.

Is there some method people have used to stop ingesting duplicate log files/entries from what is essentially 2 separate systems?

Thanks,
Tony

0 Karma

woodcock
Esteemed Legend

The best that you can do is to schedule a search like this to run every hour for the last hour to delete the duplicates. It does not save you license but should speed up your searches and confuse people less:

| streamstats count AS _serial BY _raw | search _serial>1 | delete
0 Karma

tonyparreiro
Explorer

Don't really like the idea of using up double the license, but looks like that might have to be the way.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

There is no way to do this in Splunk pre-indexing. Via search you could do a dedup on the messages. One think you could do is copy to the other server under a different file name, and then index this file also. Then at least your host and source will be different for the sourcetype. So being copied from Active (host1) to Standby (host2): host=host2 source=mainlog.log-copy_from_active.

The other question to this would be, if you have Splunk on both the Active and Standby server, then why do you need to copy the logs around? Splunk will ingest these on both as events are generated, and then in Splunk Search you can see these messages, by source and host.

0 Karma

tonyparreiro
Explorer

I've been using dedup, but was hoping there was a way to no index it to begin with, as the log files are identical and add no value to the index.

The application also uses the log files internally for it's users to query in the native environment. If the files aren't synchronised between the servers then they will get different results depending on which is the current active server. Either can be active at any one time.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Again, since these are both different systems, why don you just ingest (use SplunkUF with a monitor) on each host.

The logs will appear from two distinct hosts, and you can search based on that. E.g...

index=notgoodloggingsystem host=maybeactivehost1 host=maybeactivehost2  source="c:\mycrappy logs\logfile.log"

If you do this, there is no need to copy logs between hosts and worry about event duplication.

0 Karma

tonyparreiro
Explorer

Unfortunately the logs primary function is within the application, they are used by the users of the application and so need to be synchronised across both machines. So which ever machine is active there is a complete list available to the user. So they must be replicated across the 2 systems.

But yes completely agree if the logs did not need to replicated across both systems this would not be an issues.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Is there any logging out mechanism aside from this log file? Something you could send out to HEC endpoint? Sounds like a long shot...

0 Karma

tonyparreiro
Explorer

Unfortunately right now log files are the only option, they have discussed being able to forward logs to other systems but as of right now that requires recompiling dll's and a few other things, and it would only end up in SQL server which would then need a license for plus would also introduce a further delay.

I think for now dedup or mark the duplicate records as deleted and later on hopefully they will add the syslog option. It would be the ideal scenario and should be relatively easy from a coder perspective.

0 Karma

mattymo
Splunk Employee
Splunk Employee

Can you just use syslog? Then instead of getting mixed up in this sync process, you just catch a stream from the boxes and you don't have to worry about who is active and who isn't?

- MattyMo
0 Karma

tonyparreiro
Explorer

I'm sorry, I'm not sure what you mean exactly. But the application that generates the logs has no concept of what syslog is, it can only write to a file which is then rolled over once per day (usually) can be more often.

0 Karma

mattymo
Splunk Employee
Splunk Employee

Meh, worth a shot. Many application are able to use syslog to both send to remote host and to write to disk..If you know syslog is not an option for remote logging here, then I guess the quest continues....

- MattyMo
0 Karma

tonyparreiro
Explorer

Sadly this app is not of of those that knows what syslog is.

There is scope for having the vendor add it, down the road but this will take some time.

thanks,

0 Karma

mattymo
Splunk Employee
Splunk Employee

sad panda. unfortunately it sounds like dedup is the easiest option here...

- MattyMo
0 Karma

tonyparreiro
Explorer

I think so, for now at least.

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...