I am trying to fetch the logs from a REST url. But when ever the url getting hit all the data is fetched from the url rather than the new one. So planning to implement some check point values to fetch only the data which are not indexed from that REST URL.
Is there any way to achieve this task. Any help or response will be very much appreciated.
The main script, rest.py, doesn't use the checkpoint directory. It doesn't write anything to it, and it doesn't read anything from it.
I've been trying my hand at adding a custom handler to it, and in the handlers, you can modify the req_args such as the query parameters. I was using that to advance my "start" query parameter to be the previous "end" parameter.
After the custom handler is done, the main script checks to see if any of the special req_arms (such as the query parameters) have changed, and if so, it updates its own stanza in the inputs.conf.
However, when its stanza changes, Splunk restarts the script. This in and of itself isn't the end of the world, but if you're using a polling interval, it basically ignores your sleep and just spawns a new script which executes its first poll right away. The end result is that you're polling every few seconds (as fast as my burdened workstation can do it).
I've not tried it, but I suspect that if you used the cron polling instead of the interval (though I don't think the REST Modular Input UI exposes it), it will pause before it queries for data since the cron pause is at the front of the script, not the end.
I've debated how far to try and bend the code because at some point I might be better off writing something from scratch.