I’m working on a custom Splunk app (VendorThreatIntel) that ingests alert data from an external API using the Splunk Python SDK. Before inserting events into a custom index, I perform a duplicate check based on a unique id field. For each incoming record, I run a one-shot search against the index to check whether the record already exists. If found, I skip insertion; otherwise, I insert the event. Below is the logic I’m using: if 'data' in response and isinstance(response['data'], Sequence): inserted_count = 0 skipped_count = 0 logger.info("[Vendor Events] Received %s records for processing", len(response['data'])) server = client.connect(token=session_key, app="VendorThreatIntel") for row in response['data']: row['service_name'] = service['displayName'] row_id = row.get("id") search_query = f'search index="vendor_alerts" id="{row_id}"' logger.info(f"[Vendor Events] Checking for existing record: {search_query}") results_found = False try: rr = results.JSONResultsReader( server.jobs.oneshot(search_query, output_mode="json") ) for result in rr: if isinstance(result, dict): logger.info( "[Vendor Events] Duplicate record found with id=%s, skipping insertion.", row_id ) results_found = True skipped_count += 1 if not results_found: index = server.indexes["vendor_alerts"] index.submit(json.dumps(row)) inserted_count += 1 except Exception as e: logger.error("[Vendor Events] Error inserting data: %s", str(e)) logger.info( "[Vendor Events] Summary | Total: %s | Inserted: %s | Skipped: %s", len(response['data']), inserted_count, skipped_count ) Even when records with the same id already exist in the index, the duplicate detection logic is not being applied, and duplicate events are still getting indexed. May I know why this duplication logic is not being applied? Is this related to one-shot search behavior, indexing latency, or event availability timing during ingestion? Additionally, I would like guidance considering Splunk’s distributed architecture (search head and indexers). Is there any other recommended or reliable way to handle duplicate detection in Splunk, especially in a distributed environment? Thanks in advance.
... View more