Hello,
I am populating a summary index with a search:
index=index1
| addinfo | collect index=summary
I want to schedule the above search to run multiple times a day, but due to the nature of the data, this will introduce duplicate events into the summary index. Is there a way for the populating search to add a field to index1, called isProcessed="true", so that the populating search can filter events by isnull(isProcessed) and duplicate events won't be added to the summary index?
Data once indexed can't be changed, so the answer is no. What you can do is to modify your summary index search so that it'll exclude events from index1 which are already available in sumary.
e.g. If you've a primary key unique field in the index=index1 events, your search will be like this
index=index1 NOT [search index=summary | stats count by primaryKeyField ] | addinfo | collect index=summary
Also, I would do more analysis on why there are duplicates. Do you've overlapping time range in your summary index search?
Data once indexed can't be changed, so the answer is no. What you can do is to modify your summary index search so that it'll exclude events from index1 which are already available in sumary.
e.g. If you've a primary key unique field in the index=index1 events, your search will be like this
index=index1 NOT [search index=summary | stats count by primaryKeyField ] | addinfo | collect index=summary
Also, I would do more analysis on why there are duplicates. Do you've overlapping time range in your summary index search?
@somesoni2
That's awesome. I wasn't successful in excluding events with the stats command. I changed to the table command and verified the search works. Thanks!
index=index1 NOT [search index=summary | table primaryKeyField ] | addinfo | collect index=summary