Splunk ITSI

How do I configure Heavy Forwarder to dedup and forward to splunk cloud?

michaelmarshall
Explorer

I have a custom solution to forward cloudwatch logs events to splunk cloud.  It works great!  However, i am trying to use a pair of HF configured using fargate containers 4 instances of each.  I am treating them as 4 on the A side and 4 on the B side of an HA configuration. 

Im trying to approximate the functionality of the UF > HF autolb in the outputs.conf, only in this case the UF is a Lambda function. 

I tried sending events to one HF instance on both the A and B side, but i end up with duplicates for every event, which makes complete sense as there is no auto-dedup.

What i want to do for now, is bring up a single HF that receives ALL traffic from all A side and B side HF instances.  I want to configure it to dedup all events and send the result to splunk cloud.

Is this doable?  Would it create much latency?
How would i configure that, inputs.conf, transforms, props?  (I have outputs covered)

Thank You,
Mike

Labels (3)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Splunk does not deduplicate inputs.  You may be able to write a script or program to serve as an intermediary that receives data from two sources and removes duplicates.  Such a work-around would could be low-latency by forwarding the first instance of an event immediately.  It would tend to use a lot of memory, however, while it retains events for comparison with events from the other source.  Or it could consider only one source to be 'active' and ignore events from the other source until the first becomes non-responsive.  Either way, it's not a Splunk feature/capability.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

michaelmarshall
Explorer

interesting, i know there is a search time dedup command, is there no index time command? 

Also, when a UF is configured to output to 2 HFs which are configured for autolb, does it actually send the event to both HFs, or does it send to 1st available or whichever responds?  If it sends to both, how does the de-duplication happen?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

As already stated, there is no deduplication at ingest or index time.

Forwarders that load balance across multiple indexers send to one indexer until some criterium is met (time, volume, or indexer is unavailable) and then sends to the next.  It will send the same event twice only if indexer acknowledgment is in effect and an ACK is not received.  When that happens, one or more events may be duplicated and that must be handled at search time.

---
If this reply helps you, Karma would be appreciated.
0 Karma

michaelmarshall
Explorer

Thank you for your explanation.  I will consider this in my architecture.

richgalloway
SplunkTrust
SplunkTrust

Splunk does not deduplicate inputs.  You may be able to write a script or program to serve as an intermediary that receives data from two sources and removes duplicates.  Such a work-around would could be low-latency by forwarding the first instance of an event immediately.  It would tend to use a lot of memory, however, while it retains events for comparison with events from the other source.  Or it could consider only one source to be 'active' and ignore events from the other source until the first becomes non-responsive.  Either way, it's not a Splunk feature/capability.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...