according to: http://docs.splunk.com/Documentation/MLApp/2.4.0/User/Configurefitandapply and http://docs.splunk.com/Documentation/MLApp/2.4.0/User/Customsearchcommands the fit command is supposed to be used like:
fit <algorithm> (<option_name>=<option_value>)* (<algorithm-arg>)+ (into <model_name>)? (as <output_field>)?
My SPL is:
fit RandomForestClassifier max_memory_usage_mb=10000 max_model_size_mb=500 "targetField" from "dataField1"...
and I get this error:
Error in 'fit' command: Error while initializing algorithm "RandomForestClassifier": Unexpected parameter: max_model_size_mb
The real problem is that I'm getting only 1000 results returned from 5200+ events. I want to get all events back. Can someone help me get all the results? Thanks.
Hey _jgpm_
The parameters you are trying to set are computing resource related, and supposed to be configured in mlspl.conf, as documented here:
http://docs.splunk.com/Documentation/MLApp/3.0.0/User/Configurefitandapply
In your real problem, if you see number of results are smaller than the original inputs, it's likely that a lot of events are dropped because of missing value - they are not used in model fitting. Can you check if that's the case, and post here if it's not?
Hope it helps~
Thanks! I'm still on v2.4 which is why I was linking to that. I don't know how I missed that part about mlspl.conf, I guess it was late in the day and I was rushing. I haven't tried it yet so I don't know if will solve my problem. I asked the question because I know sort is limited to 10k results unless you add the "0" option. I thought there was something similar. I don't think it would be a null value causing the drops because it would be highly unlikely that 5209 events drops to exactly 1000...but I could be wrong. That one is actually easy to test and doesn't require restarting splunk...
so I added
...| eval countNull=0 | foreach * [eval countNull=if(isnull('<<FIELD>>'), countNull +1, countNull) ]
To the main search script and countNull showed up as 0 for every single event. To validate that the spl works, I tried something that I know would work:
...| eval countNull=0 | foreach * [eval countNull=if(match('<<FIELD>>', "\d+"), countNull +1, countNull) ]
and countNull jumped up to something like ~40 for each event. So I don't think there are any null values. I did a quick visual scan too and I didn't find any but it wasn't exhaustive.
The output number of a fit command can be set by maxresultrows in limits.conf and max_inputs in mlspl.conf, which might not be the case if you did not change the default setting. But you can double check that.
Also, you did check every single event for null value, right? 5209 events seem a lot to check visually, you may append "| stats count by countNull" to your above search.