MLTK Model Advantages and Disadvantages

rwgrant · ‎09-29-2020

I've been using MLTK 5.x as well as 4.x with the respected version of Python that works those versions. The "mlspl.conf" file has been highly altered to try and max out the MLTK app capabilities without crashing the Splunk instance. I work in a MSSP that deals with a lot of customers and custom modifications of that file were performed on each. Having done that though, I've noticed things across all of those customer's Splunk environments:

ML models that become large but are needed to be large will always have bundle replication issues. All ML models are just csv files so anything usually above 200MB doesn't do well with replication. Plus, it is extremely slow to query with "apply". Blacklisting the "__mlspl_*" lookups from replicating helps, but it's still very slow to query.
Just using the "fit" command without creating a model always runs faster, even with a large amount of data. The algos used for this statement:
- TFIDF
- RandomForestClassifier
- SVM
- DecisionTreeClassifier
- LogisticRegression
- PCA
- DensityFunction

So, knowing these 2 items, what is the advantage vs. disadvantage of just using the "fit" command everytime and not training a model to use later?

MLTK Model Advantages and Disadvantages

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

Join the Conversation

MLTK Model Advantages and Disadvantages

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...