Getting Data In

How do we define a duplicate record?

ws
Path Finder

Currently using an customize App to connect to a case / monitoring system and retrieve data.

I found out that, Splunk has the ability to detect if the data has already been indexed. 

But if I have the following scenario? will it consider as a duplicate or new data? since it has a new close case timing for the update close case.

One of the previously closed cases has been reopened and closed again with a new case closed time. will Splunk enterprise consider as a new data to index?

Labels (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Splunk cannot and  does not detect if data has already been indexed.  As @gcusello said,  it will attempt to avoid re-ingesting data, but that's not perfect.

It's up to the app doing the ingestion to prevent reading the same data twice.  In DB Connect, for example, a "rising column" is defined to identify unique records.  Your app could do something similar, using case ID and Closed Time, perhaps.

---
If this reply helps you, Karma would be appreciated.
0 Karma

ws
Path Finder

Understand Splunk will perform a check of the event at 256 chars if they are the same.

 

But at my current situation, would your recommendation be that we need to customize the application to implement a checkpoint mechanism for tracking previously indexed records?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ws ,

you have many ways to check repetitive logs, the easiest is to save logs in a file with different names (e.g. adding data and time) and use the crcSalt = <SOURCE> option in the inputs.conf related stanza.

Ciao.

Giuseppe

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Splunk does not work like a database in this respect. So, it depends on how Splunk has been set up to detect "duplicates" of this nature. This is normally done with searches in reports or alerts or dashboards. These will normally depend on your data.

What searches do you already have set up?

What does your data look like?

How is it being ingested into Splunk?

What criteria do you want to use to determine that an event represents a duplicate?

Please provide as much detail as you can (without giving away sensitive information).

ws
Path Finder

Currently, we are not focusing on searches but rather on the application created to pull data from the API provided by the destination party.

Based on my understanding of the current setup, the new data is being retrieved by the application through the destination API.

The data includes fields such as ID, case status, case close date, and others.

At this point, duplicates will be identified based on the ID field.

 

Please correct me if I'm wrong, but given the current setup, wouldn't this result in duplicate data? Since we are calling at the interval of 1 hours and 4 hours duration of logs.

For example:

10am, 6am-10am
11am, 11am-3pm

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @ws ,

Splunk indexes all new data with the only exception when all the first 256 chars of the event are the same.

Then (after indexing) you can dedup results eventually excluding duplicated data from results based on your requirements.

Deduping is usually done related to one or more fields; it's also possible to search full duplicated deduping for _raw.

Ciao.

Giuseppe

PickleRick
SplunkTrust
SplunkTrust

Are you sure you're not talking about first 256 bytes of monitored file? (of course the header length is configurable). The only duplication detection I recall is connected with useACK and even then it indexes an event twice but emits a warning AFAIR.

Get Updates on the Splunk Community!

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Index This | What goes away as soon as you talk about it?

May 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...

What's New in Splunk Observability Cloud and Splunk AppDynamics - May 2025

This month, we’re delivering several new innovations in Splunk Observability Cloud and Splunk AppDynamics ...