Looking for some hints and suggestions about how to implement this:
I have incoming log data that contains EAN barcodes (shown as number). A 3rd party API provides me with further details about what's behind the barcode. That's working so far. Already have an external lookup written in Python. However, the API has a rate limit and only allows to submit 10 queries per minute. It also only allows one barcode per query (so no batch lookups possible). Therefore there's no way to do a "live" lookup when the enduser is searching in the data.
Therefore I need to pre-lookup the barcode as soon as the events come in and cache them locally to allow searching within enriched data. The rate of incoming events might exceed 10 per minute often during the day. Due to the rate limiting the pre-lookups will be in residue during the day and again catch up during the night when there's less load. Therefore I need some way to build something like a fifo buffer (which survives even when Splunk restarts) and have a constantly running job feeding the barcodes to the lookup. Same barcode might also show up multiple time, so it must verify if a barcode is already in the buffer or if it has already been looked up earlier to avoid multiple lookups. Basically I need something like this:
Incoming log data -> verify if barcode is already in the buffer or has already been pre-looked up -> put barcode to fifo buffer -> feed the buffer at a rate of 10/minute to the external lookup command -> write the lookup result to CSV/kvstore so the enduser can search in the data without being rate limited