Splunk IT Service Intelligence

In the Machine learning toolkit, apply command with probabilities=true returns very few results.


Hi everyone, I am trying to apply logistic regression in Splunk to predict phishing, this is my query:

| apply tfidf_sender | apply tfidf_subject | apply tfidf_sender_ip | apply tfidf_url | apply tfidf_Attachments_MD5
| apply test_model probabilities=true | table Sender Subject Sender_ip "predicted(Is Phishing)" "probability(Is Phishing=Yes)"

I am applying tfidf on the fields followed by the test_model which is my logistic regression, the value for probability is populated only for a very few fields, for the rest of the fields it is empty. Can someone please help me on how to populate this value? Is there any other way to identify based on which fields, logistic regression has classified my email?

0 Karma

Splunk Employee
Splunk Employee

When we look at "Understanding fit and apply" from the MLTK docs, we see that apply can use null fields, unlike fit, when applying models to generate an predicted field . However you may not get all the functionality of the algorithm (like probabilities) if those other functionalities are reliant on good data.

Is it possible that the fields you logistic regression is being applied to are null? So the probabilities field isn't being populated because there isn't a continuous/valid value for each field?

0 Karma
Get Updates on the Splunk Community!

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...

Adoption of Infrastructure Monitoring at Splunk

  Splunk's Growth Engineering team showcases one of their first Splunk product adoption-Splunk Infrastructure ...