Is it possible to use the cross-validation in the ...

nnetz · ‎06-08-2016

Hello,

Is it possible to use the cross-validation in the Machine Learning Toolkit and Showcase app?

gabrcg · ‎01-10-2019

Cross validation is already in the ToolKit:
https://docs.splunk.com/Documentation/MLApp/4.1.0/User/ScoreCommand#K-fold_scoring

aljohnson_splun · ‎08-23-2017

The assistants for Predict Numeric Fields and Predict Categorical Fields do 2-fold cross validation for you, automatically. You can select the train-test ratio of your choosing.

melonman · ‎07-14-2016

Well, if you look for automated cross validation or single command to perform cross validation, maybe the answer is probably No at this moment.

Here is what I do for now.

For example, K-Fold cross validation where K=5, you could split your data into partitions (like into 5) using sample command.

... search to get your dataset | sample partitions=5

This will add partition_number to dataset so you can specify the number to get a part of data.
Then, and use partition 1(1/5 of data) to create model (use as train) and rest of data to use for test.

... search to get your dataset | sample partitions=5 | where partition_number=0 | fit ... into your_model | ..

and test with the rest

... search to get your dataset | sample partitions=5 | where partition_number!=0 | apply your_model | ..

then calculate errors and consolidate the result from each validation.

maybe you can automate this by other splunk job scheduling technologies... (scheduled search, summary index + some dashboard)

gabrcg · ‎01-10-2019

To apply the k-fold cross validation (using 5 folds as in the above example), you should train with 4 folds, and then test with 1 fold. The code example is doing the opposite. So, it should be:

Train with 4 folds

 | sample partitions=5 seed=1| where partition_number!=0 | fit ... into your_model |

Test with 1 fold

 | sample partitions=5 seed=1| where partition_number=0 | apply your_model

aljohnson_splun · ‎08-23-2017

Make sure you a set a seed in the sample! E.g.

| sample partitions=5 seed=42

Is it possible to use the cross-validation in the Machine Learning Toolkit and Showcase app?

Fueling your curiosity with new Splunk ILT and eLearning courses

Splunk AI Assistant for SPL 1.1.0 | Now Personalized to Your Environment for Greater ...

Unleash Unified Security and Observability with Splunk Cloud Platform