We have a number of files containing events. Each event has a unique id in itself. but the same event(with the same event_id) can exist in other files as well. We want 1 event to be indexed 1 time in Splunk and if the event with the same event_id comes again just avoid it. This is the same as _id in Elastic Search or another database's primary key.
Is it possible to achieve the same in Splunk? If yes, How?
I'm not sure if this can first validate if an event is in Splunk before it routes data to Splunk and I don't know a great deal about it, but the DSP component may do what you want
https://www.splunk.com/en_us/software/stream-processing.html
Not sure it is possible during indexing, but you can get unique event in SPL by using dedup command.
https://docs.splunk.com/Documentation/Splunk/8.0.4/SearchReference/Dedup
Thanks
Thanks, @kamlesh_vaghela for the quick response.
But we have millions of events with this behavior and every time end-user have to use dedup in all their queries.
Is there any other way to achieve the same?
Hi @akshgpt25 ,
as @kamlesh_vaghela said it isn't possible to filter events before indexing to avoid duplicates, but you could ingest all the data and schedule a search that extract the unduplicated events and store them in a summary_index, then you can use this index for your searches.
Ciao.
Giuseppe