Splunk IT Service Intelligence

In the Machine learning toolkit, apply command with probabilities=true returns very few results.


Hi everyone, I am trying to apply logistic regression in Splunk to predict phishing, this is my query:

| apply tfidf_sender | apply tfidf_subject | apply tfidf_sender_ip | apply tfidf_url | apply tfidf_Attachments_MD5
| apply test_model probabilities=true | table Sender Subject Sender_ip "predicted(Is Phishing)" "probability(Is Phishing=Yes)"

I am applying tfidf on the fields followed by the test_model which is my logistic regression, the value for probability is populated only for a very few fields, for the rest of the fields it is empty. Can someone please help me on how to populate this value? Is there any other way to identify based on which fields, logistic regression has classified my email?

0 Karma

Splunk Employee
Splunk Employee

When we look at "Understanding fit and apply" from the MLTK docs, we see that apply can use null fields, unlike fit, when applying models to generate an predicted field . However you may not get all the functionality of the algorithm (like probabilities) if those other functionalities are reliant on good data.

Is it possible that the fields you logistic regression is being applied to are null? So the probabilities field isn't being populated because there isn't a continuous/valid value for each field?

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...