Hi,
I noticed that if I send two times the exact same event, _time included, they are not merged. While investigating, I discoverd the _indextime field that could explain why they are considered as two different events.
Is it possible to set the _indextime with the value of _time ? Will it "merge" my identical events ? And if not, is there a way to do it at indexation time ?
Thanks !
I don't think there is any way to influence the _indextime field. That is simply Splunk's internal recording of when it indexed the event. But even if you could: Splunk doesn't merge events that are identical.
If you want to somehow filter part of the incoming events, provide some more details of how you are ingesting them (what input method etc.) and what the events look like.
What you can do of course is apply the dedup
command at search time, to remove duplicates from your search results. It's typically more efficient to dedup based on a certain field, but you can also do a dedup _raw
to dedup based on the entire original event (note: _raw does not include metadata fields like _indextime, it is just the original raw event itself).
Thanks for your help. I receive the events via an HTTP entrypoint, and it is json. An simplified equivalent of the requests that are sent is :
curl -k "splunk:8088/services/collector" \
-H "Authorization: Splunk 1c0afd4d-d802-gg2c-9fc2-0f428217adf7" \
-d '{"event": {"Owner": "Toto", "Title": "Hello", "Date":"2018-02-02 11:45:23"}, "sourcetype": "sales", "time":"2018-02-02 11:45:23"}'
I will use dedup
if it is my last option, but I feel like it will be redundant to write this for every request
I don't see an obvious way to filter duplicates for such events during input/parsing phase, but perhaps someone else will come by who has a smart idea for that.
Can't you tune the data source somehow to reduce sending the same event multiple times?
I don't see a way besides dedup either. This should be fixed at the source/input, if possible.
No problem, thanks for trying 🙂
In fact I have very few duplicate events, but I can't avoid them more by playing with the data source.