Splunk ITSI

How do I configure Heavy Forwarder to dedup and forward to splunk cloud?

michaelmarshall
Explorer

I have a custom solution to forward cloudwatch logs events to splunk cloud.  It works great!  However, i am trying to use a pair of HF configured using fargate containers 4 instances of each.  I am treating them as 4 on the A side and 4 on the B side of an HA configuration. 

Im trying to approximate the functionality of the UF > HF autolb in the outputs.conf, only in this case the UF is a Lambda function. 

I tried sending events to one HF instance on both the A and B side, but i end up with duplicates for every event, which makes complete sense as there is no auto-dedup.

What i want to do for now, is bring up a single HF that receives ALL traffic from all A side and B side HF instances.  I want to configure it to dedup all events and send the result to splunk cloud.

Is this doable?  Would it create much latency?
How would i configure that, inputs.conf, transforms, props?  (I have outputs covered)

Thank You,
Mike

Labels (3)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Splunk does not deduplicate inputs.  You may be able to write a script or program to serve as an intermediary that receives data from two sources and removes duplicates.  Such a work-around would could be low-latency by forwarding the first instance of an event immediately.  It would tend to use a lot of memory, however, while it retains events for comparison with events from the other source.  Or it could consider only one source to be 'active' and ignore events from the other source until the first becomes non-responsive.  Either way, it's not a Splunk feature/capability.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

michaelmarshall
Explorer

interesting, i know there is a search time dedup command, is there no index time command? 

Also, when a UF is configured to output to 2 HFs which are configured for autolb, does it actually send the event to both HFs, or does it send to 1st available or whichever responds?  If it sends to both, how does the de-duplication happen?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

As already stated, there is no deduplication at ingest or index time.

Forwarders that load balance across multiple indexers send to one indexer until some criterium is met (time, volume, or indexer is unavailable) and then sends to the next.  It will send the same event twice only if indexer acknowledgment is in effect and an ACK is not received.  When that happens, one or more events may be duplicated and that must be handled at search time.

---
If this reply helps you, Karma would be appreciated.
0 Karma

michaelmarshall
Explorer

Thank you for your explanation.  I will consider this in my architecture.

richgalloway
SplunkTrust
SplunkTrust

Splunk does not deduplicate inputs.  You may be able to write a script or program to serve as an intermediary that receives data from two sources and removes duplicates.  Such a work-around would could be low-latency by forwarding the first instance of an event immediately.  It would tend to use a lot of memory, however, while it retains events for comparison with events from the other source.  Or it could consider only one source to be 'active' and ignore events from the other source until the first becomes non-responsive.  Either way, it's not a Splunk feature/capability.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...