All Apps and Add-ons

Why is there no support for multi-threading with ML algorithms (n_jobs)?

frankwayne
Path Finder

Many (all?) of the sklearn algorithms support multi-threading using the n_jobs option. This is not exposed in Splunk, nor does fit or apply seem to use more than one thread. Why? Are there plans to do this?

For example, I was trying to improve the performance of RandomForestClassifier by making it multi-threaded, but n_jobs is not supported in RandomForestClassifier.py (not one of the 'ints' in out_params), nor is multi-threading an option in the UI's Settings tab for the MLTK.

0 Karma
1 Solution

astein_splunk
Splunk Employee
Splunk Employee

Hi! We do not expose these settings as the MLTK exposes machine learning as a first class citizen in Splunk's SPL paradigm and tries to stay true to SPL common behaviors. Splunk's SPL commands do not expose multithreaded options - all of that is abstracted by the SPL system. As the ML SPL commands (fit and apply) in the MLTK for the most part use only the search head resources, we want to be cognizant of the other potential production workloads on the shared Splunk infrastructure. If you are looking for massive scale machine learning I suggest looking at the Splunk MLTK Connector for Apache Spark (via Splunk beta) or the Splunk MLTK Container for TensorFlow (via PS) - both of which are leveraging non Splunk infrastructure for those large machine learning workflows.

View solution in original post

astein_splunk
Splunk Employee
Splunk Employee

Hi! We do not expose these settings as the MLTK exposes machine learning as a first class citizen in Splunk's SPL paradigm and tries to stay true to SPL common behaviors. Splunk's SPL commands do not expose multithreaded options - all of that is abstracted by the SPL system. As the ML SPL commands (fit and apply) in the MLTK for the most part use only the search head resources, we want to be cognizant of the other potential production workloads on the shared Splunk infrastructure. If you are looking for massive scale machine learning I suggest looking at the Splunk MLTK Connector for Apache Spark (via Splunk beta) or the Splunk MLTK Container for TensorFlow (via PS) - both of which are leveraging non Splunk infrastructure for those large machine learning workflows.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...