topic Machine Learning tool kit v3.4 model not returning result in All Apps and Add-ons

Machine Learning tool kit v3.4 model not returning result

teresachila — Tue, 29 Sep 2020 21:13:13 GMT

I just upgraded the MLTK from v2.2 to v3.4, along with the latest python SA. After this change, I realize that my Random Forest model is returning empty result for some rows. (I apply the model to a few thousand rows each time.) At first I thought that it was an input data problem. But when I took a row that had empty result before, ran it individually (i.e. doing a |head 1), then the model returned result. Then I thought maybe the model was built in v2.2, so I rebuilt (or fit again) the model in v3.4, again it was returning empty results for some rows, but a different subset of rows this time.

Has anyone seen the same issue? Should I revert back to the old version??

I don't see anything in search.log that will help, but I always see this:
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - stderr: File "/opt/splunk/etc/apps/Splunk_ML_Toolkit/bin/util/search_util.py", line 114, in add_distributed_search_info
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - stderr: raise RuntimeError('Failed to load model "%s": ' % (process_options['model_name']))
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - stderr: KeyError: 'model_name'
09-10-2018 22:08:54.625 ERROR ChunkedExternProcessor - Error in 'apply' command: (KeyError) 'model_name'

Is it trying to distribute the apply command to the indexers? Can I run it locally on the search head, since all my input data (csv and kvstore) are on the search head?

Re: Machine Learning tool kit v3.4 model not returning result

grana_splunk — Tue, 11 Sep 2018 17:15:40 GMT

Is it a distributed or Search head cluster setup? Are you using streaming apply on all the indexers?? If yes, did you upgraded PSC on all the indexers? You need to recreate your model after upgrading PSC version.

Re: Machine Learning tool kit v3.4 model not returning result

teresachila — Tue, 11 Sep 2018 21:55:06 GMT

It is set up for distributed search to multiple indexers. Not a search head cluster. How do I know if I'm using streaming apply? I only upgraded PSC on the search head, not the indexers.

Re: Machine Learning tool kit v3.4 model not returning result

grana_splunk — Tue, 29 Sep 2020 21:14:07 GMT

Open mlspl.conf file under $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/default/mlspl.conf or $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/local/mlspl.conf and check if streamily apply has been set to true or not.

Also, if you have upgraded the setup and streaming apply is true., Please upgrade PSC on all your indexers.

Re: Machine Learning tool kit v3.4 model not returning result

teresachila — Tue, 18 Sep 2018 13:27:53 GMT

I think I found the issue. For some reason, the new version does not like null or anything close to null being passed to the model. It does not like empty string (i.e. "", or len=0), and it does not like string values "NA" or "N/A" or "null" either. (The "NA" was returned by an external API.)

So far I observed three different symptoms: 1) the model returns an empty prediction value, no other messages in the log, 2) the model fails with an error message about null values being passed, 3) the model returns a prediction, but with warning message in search.log about null value in the model. Which symptom manifests when depends on how many rows are being processed. If I apply the model with 1 row, it usually returns a prediction value. If I apply it to thousands of rows, it usually returns empty value.

To remediate, I added this code:

| fillnull value="NoValue"
| foreach prefix_*  [eval <<FIELD>>=if(len(<<FIELD>>)=0 OR <<FIELD>>="N/A" OR <<FIELD>>="NA" OR <<FIELD>>="null","NoValue",<<FIELD>>)]

Re: Machine Learning tool kit v3.4 model not returning result

teresachila — Tue, 18 Sep 2018 13:56:56 GMT

Thanks! stream_apply is set to false.