Getting Data In

Deduplicate events- How can we override?

cyberhaven
New Member
We have a script as a data source, and sometimes events could be duplicated (same ID). Using | dedup id in the search helps, but we want to override events with the same ID if possible. We have tried some solutions from the internet and documentation, but they haven't helped.
 
props.conf
[incidents_script]
TZ = UTC
category = Splunk App Add-on Builder
pulldown_type = 1
python.version = python3
TRUNCATE = 1000000
INDEXED_EXTRACTIONS = json
TIMESTAMP_FIELDS = trigger_time
SHOULD_LINEMERGE = false
AUTO_KV_JSON = false
KV_MODE = none
TRANSFORMS-index = replace_existing deduplicate
REPORT-id = extract_id
TRANSFORMS-debug = debug_deduplicate
EXTRACT-id = "id"\s*:\s*"([^"]+)"

transforms.conf

[replace_existing]
REGEX = .
DEST_KEY = _SYS_CHECKSUM
FORMAT = index-replace

[deduplicate]
REGEX = .
MV_ADD = true

[debug_deduplicate]
REGEX = .
MV_ADD = true

[extract_id]
REGEX = "id"\s*:\s*"([^"]+)"
FORMAT = id::$1


Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

If I understand you correctly, you'd like to deduplicate events on ingest - either not ingest an event if there is already one with the same value of a field called ID or overwrite previous values of such field.

Well, that's not possible with native splunk functionalities.

1. Splunk ingestion process works one event at a time.

2. Splunk ingestion process works "one-way" - you can't "check what's already in the index". Remember that parsing can be performed way, way before the event even reaches the indexers (and different events from the same source can be processed on different components). Also, you don't have access to search-time extracted values during the ingestion process.

3. There is no "overwriting" in Splunk.

So if it's really essential for you that you don't ingest duplicated ID's, you need to design your own ingestion process that will keep your events deduplicated (but for that you'd need some buffer window which will increase latency).

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...