All Apps and Add-ons

SentinelOne Applications Channel Input runs for days

ericnewman
Explorer

The sentinelone.py process for the applications channel input is running under a single PID for several days.  It does not appear to be respecting the checkpoint.  @aplura_llc_supp  any assistance on whether this is expected behavior and if not how the issue might be resolved would be greatly appreciated.

We have three Sentinel One input channels enabled (Agents, Threats, Applications). 

  • The modular input is configured with an interval of 300 for the threats channel seems to run fairly quickly (less than 2 minutes) and does not seem to be duplicating ingest.

     

  • The modular input for the agents channel is configured with an interval of 86400 and seems to run in about 45 minutes to 1 hour but does seem to be duplicating ingest based on the following search.

     

 

index=sentinelone_index sourcetype="sentinelone:channel:agents" earliest=1 latest=now
| rex mode=sed field=_raw "s/, \"modular_input_consumption_time\": \"\w{3}, \d{2} \w{3} \d{4} \d{2}:\d{2}:\d{2} (\+|\-)\d{4}\"/, \"modular_input_consumption_time\": \"\"/g"
| stats count as dup_count by _raw | stats count by dup_count

 

 

  • The modular input for the applications channel is configured with an interval of 3600. It runs for multiple days with the same PID and seems to be duplicating ingest based on a similar search.  It seems that it may not be respecting the checkpoint.

The checkpoints for all three input channels appear to be getting set correctly on the following path $SPLUNK_HOME/var/lib/splunk/modinputs/sentinelone/.

/opt/splunk/var/lib/splunk/modinputs/sentinelone/usea1-014.sentinelone.net_sentinelone-input-efd4172-40fe-b76-811f-c8cdf72132e-channel-applications.json

 

{"next_page": "", "last_execution": "1637470721"}

 

The input also appears to get the checkpoint successfully

 

2021-11-29 07:40:21,521 log_level=INFO pid=31196 tid=MainThread file="s1_client.py" function="get_channel" line_number="373" version="sentinelone_app_for_splunk.v5.1.2.b35" action=calling_applications_channel status=start start=1637470721000 start_length=13 start_type=<class 'str'> end=1638193221000 end_length=13 end_type=<class 'str'> checkpoint=1637470721 channel=applications
2021-11-29 07:40:21,521 log_level=WARNING pid=31196 tid=MainThread file="s1_client.py" function="get_channel" line_number="365" version="sentinelone_app_for_splunk.v5.1.2.b35" action=got_checkpoint checkpoint={'next_page': '', 'last_execution': '1637470721'} channel=applications

 

The Input Add On for SentinelOne App For Splunk (IA-sentinelone_app_for_splunk) is installed on the heavy forwarder.  The Input Add On for SentinelOne App For Splunk (TA-sentinelone_app_for_splunk) is installed on a search head cluster and a stand alone Enterprise Security search head.  The SentinelOne App For Splunk is not currently installed.

All input channels are producing events.

Any ideas on how to troubleshoot and or resolve this issue would be appreciated.

Labels (1)
0 Karma

aplura_llc_supp
Path Finder

So  the Agents and Applications are non-timestamp aware data ingest, by design. Agents might be running more than 1 hour, causing a backup, but should not run for days. Adjust your interval to 86400 for both agents and applications and see if it calms down and doesn't duplicate the data.

Applications API doesn't support time: https://usea1-partners.sentinelone.net/api-doc/api-details?category=agents&api=applications

Agents can support time, but the design was not to. This allows for lookup building for other data, even if the agents have been created or updated past the checkpoint time. https://usea1-partners.sentinelone.net/api-doc/api-details?category=agents&api=get-agents

Threats should be time-aware, and "respect" the checkpoint.

0 Karma

ericnewman
Explorer

@aplura_llc, Thank you for the response.  It's helpful to understand that the applications and agents API are not time aware.  How are the modular inputs designed to avoid ingesting the same data over and over?

Per your recommendation I have adjusted the applications input interval to 86400 but I don't expect that to resolve the issue (note: the agent interval was already 86400).

Based on the XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX-sentinelone-modularinput.log files.

  • The agents input runs for about 40 minutes and processes 10,000-14,000 total items each run
  • The applications input runs over 4 days and processes 700,000-1,000,000 total items each run and frequently hits the bulk_import_limit=1,000,000

This behavior appears to be fairly consistent over the past 90 days.

Our concern is the long running applications input and the apparent duplication of effort.  Is there anything else we can do to reduce the runtime of the applications modular input?

0 Karma

aplura_llc_supp
Path Finder

Those two specifically are NOT designed to avoid the ingest of similar data over time. They were designed as a "pull all each time" to account for finding "missing" or other similar types of searches that require the entire data set of agents each time.  I have entered an issue into the project to investigate what can be done at those scales for the applications endpoint. I'm not sure we can do anything without S1 changing endpoint supports, but we will take a look and try. You should be able to up the bulk_import_limit to 2,000,000 and see if that helps pull in the data and get it all.

Additionally, it is multi-threaded. If you have a sufficiently beefy HF, find the bin/s1_client.py file in the app root.  Update line line 186 per below

p = mp.Pool(10)

CHANGE IT TO

p = mp.Pool(50)

This will increase the amount of available threads to the paginator, and you should see an increase in ingest.  Keep increasing it as you see fit, and document it was changed. On an app upgrade, it will be overwritten. We cannot make that configurable, as it might violate Splunk Cloud requirements.

0 Karma

ericnewman
Explorer

Thanks for the suggestions.  We are running the inputs on a heavy forwarder with relatively light load, 12 CPUs and 12 GB memory.  We made the changes to the number of threads available to the paginator as suggested and saw a measurable decrease in throughput.  With 10 threads we were seeing about 9,000/hr with 50 threads we're seeing about 7,000/hr.

The applications input will likely run event longer than 4 days now.  If you have any other suggestions that would be great.  Thanks again for the assistance.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!