All Apps and Add-ons

How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?

Explorer

I am trying to train clustering model but keep running in the memory limit error because the data is big. I would like to use event sampling but I am not aware of the command for it.
How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?

0 Karma

SplunkTrust
SplunkTrust

How big is your sample data? Do you need to train on this large of sample data? Why not train on a smaller sample set if it represents a good percentage of the data needed?

Are you sure you're not bumping into limits as opposed to running out of memory?

0 Karma

Explorer

Skoelpin , I have 500k observation. I want to limit to smaller set because I am just using a MLTK sandbox to judge if MLTK is a right solution for us before configuring it in PROD.

Let me know if you have solution. Thanks!

SplunkTrust
SplunkTrust

The point I'm trying to make is, why sample a larger data set when you can just reduce the size of the training data set.

Are you sure you're not bumping into limits as opposed to running out of memory?

Lastly, the MLTK is a collection of libraries imported into Splunk. It will work if you're giving it the right data and ask the right questions.

0 Karma

Explorer

When we just reduce the size of the training data set, it doesn't randomly select the observation(rows/events). As a result, the data can't closely represent the whole population data-set.
If we using sampling, the data is randomly selected and it is more representative of our data-set.

You are right, I am bumping into limits. I have already requested to increase the limit. In the meantime, I wanted to learn about how I can sample using SPL to serve the immediate needs.

0 Karma

SplunkTrust
SplunkTrust

You can use event sampling above the search bar to accomplish this. You can also use certain SPL techniques to do sampling such as

|eval samplingperc=20 
| eval search=ceil(100/samplingperc)

Which means, sample 20% of the data.

Lastly, you can control these limits in the MLTK UI directly under the Settings tab in the nav bar. If this answered, your question, please accept it

0 Karma

Explorer

Thanks a lot.

0 Karma

Explorer

These commands doesn't seem to work. Are there any limitations ?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!