Splunk ITSI

In the Machine learning toolkit, apply command with probabilities=true returns very few results.

KrithikaRamakri
Explorer

Hi everyone, I am trying to apply logistic regression in Splunk to predict phishing, this is my query:

sourcetype="incoming_email"
| apply tfidf_sender | apply tfidf_subject | apply tfidf_sender_ip | apply tfidf_url | apply tfidf_Attachments_MD5
| apply test_model probabilities=true | table Sender Subject Sender_ip "predicted(Is Phishing)" "probability(Is Phishing=Yes)"

I am applying tfidf on the fields followed by the test_model which is my logistic regression, the value for probability is populated only for a very few fields, for the rest of the fields it is empty. Can someone please help me on how to populate this value? Is there any other way to identify based on which fields, logistic regression has classified my email?

0 Karma

astein_splunk
Splunk Employee
Splunk Employee

When we look at "Understanding fit and apply" from the MLTK docs, we see that apply can use null fields, unlike fit, when applying models to generate an predicted field . However you may not get all the functionality of the algorithm (like probabilities) if those other functionalities are reliant on good data.

Is it possible that the fields you logistic regression is being applied to are null? So the probabilities field isn't being populated because there isn't a continuous/valid value for each field?

0 Karma
Get Updates on the Splunk Community!

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

 Splunk is More Than Just the Web Console For Digital Forensics and Incident Response (DFIR) practitioners, ...

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...