Currently we are checking data already exists in Splunk DB by isinstance method, here we need to iterate through entire data which is time consuming, Is there any best way to check same data already exists in Db to avoid duplication.
Thanks for update @ITWhisperer , we are doing extraction during search, but user dont want duplication in splunk event as well so we implemented isinstance method to check data exist or not, is there any other way to check duplicate
It depends - do you mean "duplicate" events being returned in your search? What is the level of duplication? Is it the whole event i.e. if a single character is different then it is not a duplicate? Or is it that a particular field or set of fields have unique values? Or some other criteria that you would use to determine if an event is a duplicate?
Basically we are inserting data using Rest Api, after 1 hour interval our stream events get called and it dumps all the data, to avoid this we use lookup before insertion. On UI if we remove duplicate, it works as expected but in event there is lot of duplicates values, which is taking lots of space and giving slow performance
If you mean some sort of pre-indexing lookup, then the indexing / ingestion process in Splunk is not really designed for that. Any pre-indexing lookup / search would slow up the indexing process far too much and more likely to cause other issues. You would be better off doing your deduplication as part of the search process, which you could then use to populate a summary index with just the deduplicated events (or better yet, the aggregated results, depending on your usecase).