Getting Data In

How to avoid duplicates if two forwarders are monitoring the same directory?

yoyu777
Explorer

We are considering to deploy Splunk forwarders on our servers.
For resilience, we want to install a forwarder on each of the two servers.
The problem is, the files to be monitored can only be output to a shared drive, to which both servers have access to.

My question is if I configure the two forwarders to monitor the same directory, is there a way to avoid duplicates?
We will probably use the "batch" forwarding method. Is there a way to lock the file when it is being forwarded, so the other forwarder will not forward this file?

0 Karma
1 Solution

DalJeanis
Legend

Hmmm. Cant see much of a way to get any value out of this while avoiding thrashing. If nothing distinguishes between what one and the other must index, then both of them will constantly spend extra time and occasionally be stepping on each other.

You COULD try complimentary blacklists/whitelists, so that each indexer would only map certain file names. If all your log files ended with a digit that was effectively random, for example, then one indexer could index only odd numbers and the other even numbers.

Perhaps a more viable solution would be to have them each monitor a subdirectory, and have an external process copy/move "their" files to "their" input hopper, then have splunk send the ingested files to the bit bucket.

A hybrid between these approaches would be to have both monitor the single directory for specific (disjoint) filenames, and as a backup, have each monitor a subdirectory which would be their "personal" input hopper. If one server went down, some process would then move the input to the other server's hopper for consumption.

View solution in original post

nayakr
Engager

Hi,

I have a question here.
In our project design we have 2 Heavy forwarders on each site to add resiliency such a way if one Heavy forwarders goes down then we have backup so that other forwarder will continue the job.
This is in initial stage.

In our case heavy forwarders are in scope to use the messages from jms queues.
But how Heavy forwarders will be knowing which one to perform the job?
how to design resiliency on each site (cases like patching, maintenance activity etc) so that other forwarder on same site continue the job.
Will there be any duplicates if 2 forwarders are listening to same jms queues?
If yes, How to avoid and add the resiliency ?

Waiting for your update.

Thanks,
Ravi

0 Karma

nayakr
Engager

Hi,

I have a question here.
In our project design we have 2 Heavy forwarders on each site to add resiliency such a way if one Heavy forwarders goes down then we have backup so that other forwarder will continue the job.
This is in initial stage.

In our case heavy forwarders are in scope to use the messages from jms queues.
But how Heavy forwarders will be knowing which one to perform the job?
how to design resiliency on each site (cases like patching, maintenance activity etc) so that other forwarder on same site continue the job.
Will there be any duplicates if 2 forwarders are listening to same jms queues?
If yes, How to avoid and add the resiliency ?

Waiting for your update.

Thanks,
Ravi

0 Karma

DalJeanis
Legend

Hmmm. Cant see much of a way to get any value out of this while avoiding thrashing. If nothing distinguishes between what one and the other must index, then both of them will constantly spend extra time and occasionally be stepping on each other.

You COULD try complimentary blacklists/whitelists, so that each indexer would only map certain file names. If all your log files ended with a digit that was effectively random, for example, then one indexer could index only odd numbers and the other even numbers.

Perhaps a more viable solution would be to have them each monitor a subdirectory, and have an external process copy/move "their" files to "their" input hopper, then have splunk send the ingested files to the bit bucket.

A hybrid between these approaches would be to have both monitor the single directory for specific (disjoint) filenames, and as a backup, have each monitor a subdirectory which would be their "personal" input hopper. If one server went down, some process would then move the input to the other server's hopper for consumption.

yoyu777
Explorer

thanks for the answer DalJeanis

Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...