Those two specifically are NOT designed to avoid the ingest of similar data over time. They were designed as a "pull all each time" to account for finding "missing" or other similar types of searches that require the entire data set of agents each time. I have entered an issue into the project to investigate what can be done at those scales for the applications endpoint. I'm not sure we can do anything without S1 changing endpoint supports, but we will take a look and try. You should be able to up the bulk_import_limit to 2,000,000 and see if that helps pull in the data and get it all. Additionally, it is multi-threaded. If you have a sufficiently beefy HF, find the bin/s1_client.py file in the app root. Update line line 186 per below p = mp.Pool(10)
CHANGE IT TO
p = mp.Pool(50) This will increase the amount of available threads to the paginator, and you should see an increase in ingest. Keep increasing it as you see fit, and document it was changed. On an app upgrade, it will be overwritten. We cannot make that configurable, as it might violate Splunk Cloud requirements.
... View more