We have an issue with an external REST API that works properly 99% of the time, but once in a while it publishes data as "back dated".
Background: We have configured a Splunk Add-on to fetch the API every 300 seconds with a REST API filter (let's say the filter is named "DateCreated") used as part of HTTP GET request indicating the time when an event was generated.
If everything would work like expected, this is how we can do the implementation:
Filter: DateCreated > (now - 300s)
Result: Returns all events from last 5 min time period
However, once in a while, the API publishes data with DateCreated that is back dated up to multiple hours, meaning it does not match our initial implementation and resulting in missed events. We have also investigated there is no other filter in the API that we could used to go around this issue 100% of cases.
Potential solution: We have been considering a solution where we would do a batch search, e.g. once in 24 hours fetching all events from the API. This would receive all events from past 24 hours (with high certainty the back dated as well) and then would process all events:
Ingest events that have not been ingested as part of past API calls
Dump the ones that can be considered duplicates (ingested as part of past API calls)
Implementing a batch API call feature comes with another problem, it generates duplicate events to index. We want to keep the index clean from duplicate events due to our configured alerting and reporting logic.
To have a solution to not have dupe events, we have been considering two options:
Add-on would be utilizing KV store, storing unique identifier of every event during the original API call. Batch API call would then utilize the store for duplicate detection, dropping the ones already ingested. This comes with an issue of over time KV store growing and nobody cleaning it.
Is there any good way to clean up KV store either once in a while or like set max size to it and it would remove the oldest data automatically? Optimally the clean up could be performed by the add-on.
As part of every batch API call, add-on would perform REST API calls against Splunk index where data is already ingested, parsing the unique identifiers and using them to drop duplicates.
Does an add-on have permissions to perform Splunk REST API calls natively without additional credentials?
If not, what would be the optimal way of creating and storing account information?
Any example implementation to mention of Add-on calling Splunk REST API?
Any other potential implementation idea?
In the end, we want to minimize admin overhead over different Splunk environments performing exact same API calls, but for different entities. We have multiple environments that perform same activity so this should be a solution that can be easily deployed and managed for multiple environments.