best way to check data exists before insert

KJ10

Currently we are checking data already exists in Splunk DB by isinstance method, here we need to iterate through entire data which is time consuming, Is there any best way to check same data already exists in Db to avoid duplication.

KJ10

Thanks for update @ITWhisperer , we are doing extraction during search, but user dont want duplication in splunk event as well so we implemented isinstance method to check data exist or not, is there any other way to check duplicate

ITWhisperer

It depends - do you mean "duplicate" events being returned in your search? What is the level of duplication? Is it the whole event i.e. if a single character is different then it is not a duplicate? Or is it that a particular field or set of fields have unique values? Or some other criteria that you would use to determine if an event is a duplicate?

KJ10

Basically we are inserting data using Rest Api, after 1 hour interval our stream events get called and it dumps all the data, to avoid this we use lookup before insertion. On UI if we remove duplicate, it works as expected but in event there is lot of duplicates values, which is taking lots of space and giving slow performance

ITWhisperer

If you mean some sort of pre-indexing lookup, then the indexing / ingestion process in Splunk is not really designed for that. Any pre-indexing lookup / search would slow up the indexing process far too much and more likely to cause other issues. You would be better off doing your deduplication as part of the search process, which you could then use to populate a summary index with just the deduplicated events (or better yet, the aggregated results, depending on your usecase).

best way to check data exists before insert

index

modular input

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

best way to check data exists before insert

index

modular input

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...