Splunk Search

With 2 sources producing similar data, how to dedup events within 2 seconds of each other, but only keep events from one particular source?

gesman
Communicator

I have two sources of traffic logs my_source1 and my_source2 that record approximately the same data with few important differences.
I need to dedup data in this way:
source=my_source* | dedup _time, ip, page

But with the following important difference:
If events are found to occur within 2 seconds of each other (same ip, page) - consider them duplicates, but only keep events from my_source2, even if they occurred earlier.
What's the most efficient way to accomplish that?

Note: system generates up to 100,000 events per hour.

Tags (1)
0 Karma

inode
Explorer

I would suggest you using transaction command if the data volume is not so high. The biggest advantage is that it enables you to aggregate similar events from the distinct sources in one transaction while providing a "duration" field based on the _time used between the similar events.

By using eval's mvindex() you are then able to keep only the last or first events from the transaction.

0 Karma
Get Updates on the Splunk Community!

SOC4Kafka - New Kafka Connector Powered by OpenTelemetry

The new SOC4Kafka connector, built on OpenTelemetry, enables the collection of Kafka messages and forwards ...

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Building Momentum: Splunk Developer Program at .conf25

At Splunk, developers are at the heart of innovation. That’s why this year at .conf25, we officially launched ...