- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?

I am trying to train clustering model but keep running in the memory limit error because the data is big. I would like to use event sampling but I am not aware of the command for it.
How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


How big is your sample data? Do you need to train on this large of sample data? Why not train on a smaller sample set if it represents a good percentage of the data needed?
Are you sure you're not bumping into limits as opposed to running out of memory?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Skoelpin , I have 500k observation. I want to limit to smaller set because I am just using a MLTK sandbox to judge if MLTK is a right solution for us before configuring it in PROD.
Let me know if you have solution. Thanks!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


The point I'm trying to make is, why sample a larger data set when you can just reduce the size of the training data set.
Are you sure you're not bumping into limits as opposed to running out of memory?
Lastly, the MLTK is a collection of libraries imported into Splunk. It will work if you're giving it the right data and ask the right questions.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

When we just reduce the size of the training data set, it doesn't randomly select the observation(rows/events). As a result, the data can't closely represent the whole population data-set.
If we using sampling, the data is randomly selected and it is more representative of our data-set.
You are right, I am bumping into limits. I have already requested to increase the limit. In the meantime, I wanted to learn about how I can sample using SPL to serve the immediate needs.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


You can use event sampling above the search bar to accomplish this. You can also use certain SPL techniques to do sampling such as
|eval samplingperc=20
| eval search=ceil(100/samplingperc)
Which means, sample 20% of the data.
Lastly, you can control these limits in the MLTK UI directly under the Settings
tab in the nav bar. If this answered, your question, please accept it
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks a lot.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

These commands doesn't seem to work. Are there any limitations ?
