All Apps and Add-ons

MLTK v2.4 returning max 1000 results and max_memory_usage_mb/max_model_size_mb options cause errors

_jgpm_
Communicator

according to: http://docs.splunk.com/Documentation/MLApp/2.4.0/User/Configurefitandapply and http://docs.splunk.com/Documentation/MLApp/2.4.0/User/Customsearchcommands the fit command is supposed to be used like:

 fit <algorithm> (<option_name>=<option_value>)* (<algorithm-arg>)+ (into <model_name>)? (as <output_field>)?

My SPL is:

fit RandomForestClassifier max_memory_usage_mb=10000 max_model_size_mb=500 "targetField" from "dataField1"...

and I get this error:

Error in 'fit' command: Error while initializing algorithm "RandomForestClassifier": Unexpected parameter: max_model_size_mb

The real problem is that I'm getting only 1000 results returned from 5200+ events. I want to get all events back. Can someone help me get all the results? Thanks.

0 Karma

yangzd
Splunk Employee
Splunk Employee

Hey _jgpm_

The parameters you are trying to set are computing resource related, and supposed to be configured in mlspl.conf, as documented here:
http://docs.splunk.com/Documentation/MLApp/3.0.0/User/Configurefitandapply

In your real problem, if you see number of results are smaller than the original inputs, it's likely that a lot of events are dropped because of missing value - they are not used in model fitting. Can you check if that's the case, and post here if it's not?

Hope it helps~

_jgpm_
Communicator

Thanks! I'm still on v2.4 which is why I was linking to that. I don't know how I missed that part about mlspl.conf, I guess it was late in the day and I was rushing. I haven't tried it yet so I don't know if will solve my problem. I asked the question because I know sort is limited to 10k results unless you add the "0" option. I thought there was something similar. I don't think it would be a null value causing the drops because it would be highly unlikely that 5209 events drops to exactly 1000...but I could be wrong. That one is actually easy to test and doesn't require restarting splunk...

so I added

...| eval countNull=0 | foreach * [eval countNull=if(isnull('<<FIELD>>'), countNull +1, countNull) ]

To the main search script and countNull showed up as 0 for every single event. To validate that the spl works, I tried something that I know would work:

...| eval countNull=0 | foreach * [eval countNull=if(match('<<FIELD>>', "\d+"), countNull +1, countNull) ]

and countNull jumped up to something like ~40 for each event. So I don't think there are any null values. I did a quick visual scan too and I didn't find any but it wasn't exhaustive.

0 Karma

yangzd
Splunk Employee
Splunk Employee

The output number of a fit command can be set by maxresultrows in limits.conf and max_inputs in mlspl.conf, which might not be the case if you did not change the default setting. But you can double check that.
Also, you did check every single event for null value, right? 5209 events seem a lot to check visually, you may append "| stats count by countNull" to your above search.

0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...