I am using the REST TA ( https://apps.splunk.com/apps/id/rest_ta ) to pull data from an API which outputs CSV data. The API allows me to pull all events or last 10 events and since I need everything, I need to pull all every time. This means that there is duplicated events every time the REST TA pulls data.
I need to use a key of some sort to avoid duplicates and do not know where to start. A search on this answers board and on stackoverflow are not resulting in answers that are what I need.
Is there a way to manually specific the ID/Key used to index the data? If so, then that would presumably prevent duplicates since it cannot be duplicated. Or, what I am doing on elasticsearch, is using a duplicate ID to overwrite existing data in the index that has the same key with the new data. That is also a possibility as some of the data from the source could have changed but needs to be updated (like if the issue is pending or resolved, etc.)
Thanks in advance.
Does the API you are pulling data from have documentation ?
If so , does this documentation have information on how to apply cursoring to your requests ?
Typical cursoring approaches for REST API's involve from/until timestamps in the HTTP request or perhaps some sort of sequential event id that you only want events since.
If there is nothing available to you at the API interface , then you will have to plug in a custom response handler to the REST Input stanza.This custom response handler could keep a log of event ids/timestamps etc.. and then only output event data to Splunk for indexing that is unique. Would probably be very easy to do.
Unfortunately, the API does not allow cursors or “events since”.
If I am using the REST TA as mentioned in the OP, how (and what) would I add and where for the customization you mentioned? I am not using the rest I put framework from git which I have see. Reference to in relation to the custom response code.
Does the customization mean I need to write my own I put module?
Is there no way to say what my unique key is when the data comes in with some sort of index time field extraction so as to make the event not insert or to update existing documents?