- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible to use the cross-validation in the Machine Learning Toolkit and Showcase app?
Hello,
Is it possible to use the cross-validation in the Machine Learning Toolkit and Showcase app?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cross validation is already in the ToolKit:
https://docs.splunk.com/Documentation/MLApp/4.1.0/User/ScoreCommand#K-fold_scoring
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


The assistants for Predict Numeric Fields and Predict Categorical Fields do 2-fold cross validation for you, automatically. You can select the train-test ratio of your choosing.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Well, if you look for automated cross validation or single command to perform cross validation, maybe the answer is probably No at this moment.
Here is what I do for now.
For example, K-Fold cross validation where K=5, you could split your data into partitions (like into 5) using sample command.
... search to get your dataset | sample partitions=5
This will add partition_number to dataset so you can specify the number to get a part of data.
Then, and use partition 1(1/5 of data) to create model (use as train) and rest of data to use for test.
... search to get your dataset | sample partitions=5 | where partition_number=0 | fit ... into your_model | ..
and test with the rest
... search to get your dataset | sample partitions=5 | where partition_number!=0 | apply your_model | ..
then calculate errors and consolidate the result from each validation.
maybe you can automate this by other splunk job scheduling technologies... (scheduled search, summary index + some dashboard)
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To apply the k-fold cross validation (using 5 folds as in the above example), you should train with 4 folds, and then test with 1 fold. The code example is doing the opposite. So, it should be:
Train with 4 folds
| sample partitions=5 seed=1| where partition_number!=0 | fit ... into your_model |
Test with 1 fold
| sample partitions=5 seed=1| where partition_number=0 | apply your_model
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


Make sure you a set a seed in the sample! E.g.
| sample partitions=5 seed=42
