The sentinelone.py process for the applications channel input is running under a single PID for several days. It does not appear to be respecting the checkpoint. @aplura_llc_supp any assistance on whether this is expected behavior and if not how the issue might be resolved would be greatly appreciated.
We have three Sentinel One input channels enabled (Agents, Threats, Applications).
index=sentinelone_index sourcetype="sentinelone:channel:agents" earliest=1 latest=now
| rex mode=sed field=_raw "s/, \"modular_input_consumption_time\": \"\w{3}, \d{2} \w{3} \d{4} \d{2}:\d{2}:\d{2} (\+|\-)\d{4}\"/, \"modular_input_consumption_time\": \"\"/g"
| stats count as dup_count by _raw | stats count by dup_count
The checkpoints for all three input channels appear to be getting set correctly on the following path $SPLUNK_HOME/var/lib/splunk/modinputs/sentinelone/.
/opt/splunk/var/lib/splunk/modinputs/sentinelone/usea1-014.sentinelone.net_sentinelone-input-efd4172-40fe-b76-811f-c8cdf72132e-channel-applications.json
{"next_page": "", "last_execution": "1637470721"}
The input also appears to get the checkpoint successfully
2021-11-29 07:40:21,521 log_level=INFO pid=31196 tid=MainThread file="s1_client.py" function="get_channel" line_number="373" version="sentinelone_app_for_splunk.v5.1.2.b35" action=calling_applications_channel status=start start=1637470721000 start_length=13 start_type=<class 'str'> end=1638193221000 end_length=13 end_type=<class 'str'> checkpoint=1637470721 channel=applications
2021-11-29 07:40:21,521 log_level=WARNING pid=31196 tid=MainThread file="s1_client.py" function="get_channel" line_number="365" version="sentinelone_app_for_splunk.v5.1.2.b35" action=got_checkpoint checkpoint={'next_page': '', 'last_execution': '1637470721'} channel=applications
The Input Add On for SentinelOne App For Splunk (IA-sentinelone_app_for_splunk) is installed on the heavy forwarder. The Input Add On for SentinelOne App For Splunk (TA-sentinelone_app_for_splunk) is installed on a search head cluster and a stand alone Enterprise Security search head. The SentinelOne App For Splunk is not currently installed.
All input channels are producing events.
Any ideas on how to troubleshoot and or resolve this issue would be appreciated.
So the Agents and Applications are non-timestamp aware data ingest, by design. Agents might be running more than 1 hour, causing a backup, but should not run for days. Adjust your interval to 86400 for both agents and applications and see if it calms down and doesn't duplicate the data.
Applications API doesn't support time: https://usea1-partners.sentinelone.net/api-doc/api-details?category=agents&api=applications
Agents can support time, but the design was not to. This allows for lookup building for other data, even if the agents have been created or updated past the checkpoint time. https://usea1-partners.sentinelone.net/api-doc/api-details?category=agents&api=get-agents
Threats should be time-aware, and "respect" the checkpoint.
@Anonymous, Thank you for the response. It's helpful to understand that the applications and agents API are not time aware. How are the modular inputs designed to avoid ingesting the same data over and over?
Per your recommendation I have adjusted the applications input interval to 86400 but I don't expect that to resolve the issue (note: the agent interval was already 86400).
Based on the XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX-sentinelone-modularinput.log files.
This behavior appears to be fairly consistent over the past 90 days.
Our concern is the long running applications input and the apparent duplication of effort. Is there anything else we can do to reduce the runtime of the applications modular input?
Those two specifically are NOT designed to avoid the ingest of similar data over time. They were designed as a "pull all each time" to account for finding "missing" or other similar types of searches that require the entire data set of agents each time. I have entered an issue into the project to investigate what can be done at those scales for the applications endpoint. I'm not sure we can do anything without S1 changing endpoint supports, but we will take a look and try. You should be able to up the bulk_import_limit to 2,000,000 and see if that helps pull in the data and get it all.
Additionally, it is multi-threaded. If you have a sufficiently beefy HF, find the bin/s1_client.py file in the app root. Update line line 186 per below
p = mp.Pool(10)
CHANGE IT TO
p = mp.Pool(50)
This will increase the amount of available threads to the paginator, and you should see an increase in ingest. Keep increasing it as you see fit, and document it was changed. On an app upgrade, it will be overwritten. We cannot make that configurable, as it might violate Splunk Cloud requirements.
Thanks for the suggestions. We are running the inputs on a heavy forwarder with relatively light load, 12 CPUs and 12 GB memory. We made the changes to the number of threads available to the paginator as suggested and saw a measurable decrease in throughput. With 10 threads we were seeing about 9,000/hr with 50 threads we're seeing about 7,000/hr.
The applications input will likely run event longer than 4 days now. If you have any other suggestions that would be great. Thanks again for the assistance.