How to I set a sampling ratio for initial search f...

ruwalbi · ‎01-27-2020

I am trying to train clustering model but keep running in the memory limit error because the data is big. I would like to use event sampling but I am not aware of the command for it.
How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?

skoelpin · ‎01-27-2020

How big is your sample data? Do you need to train on this large of sample data? Why not train on a smaller sample set if it represents a good percentage of the data needed?

Are you sure you're not bumping into limits as opposed to running out of memory?

ruwalbi · ‎01-27-2020

Skoelpin , I have 500k observation. I want to limit to smaller set because I am just using a MLTK sandbox to judge if MLTK is a right solution for us before configuring it in PROD.

Let me know if you have solution. Thanks!

skoelpin · ‎01-27-2020

The point I'm trying to make is, why sample a larger data set when you can just reduce the size of the training data set.

Are you sure you're not bumping into limits as opposed to running out of memory?

Lastly, the MLTK is a collection of libraries imported into Splunk. It will work if you're giving it the right data and ask the right questions.

ruwalbi · ‎01-27-2020

When we just reduce the size of the training data set, it doesn't randomly select the observation(rows/events). As a result, the data can't closely represent the whole population data-set.
If we using sampling, the data is randomly selected and it is more representative of our data-set.

You are right, I am bumping into limits. I have already requested to increase the limit. In the meantime, I wanted to learn about how I can sample using SPL to serve the immediate needs.

skoelpin · ‎01-27-2020

You can use event sampling above the search bar to accomplish this. You can also use certain SPL techniques to do sampling such as

|eval samplingperc=20 
| eval search=ceil(100/samplingperc)

Which means, sample 20% of the data.

Lastly, you can control these limits in the MLTK UI directly under the Settings tab in the nav bar. If this answered, your question, please accept it

ruwalbi · ‎01-27-2020

Thanks a lot.

ruwalbi · ‎01-27-2020

These commands doesn't seem to work. Are there any limitations ?

How to I set a sampling ratio for initial search for Splunk MLTK? Do we have specific SPL command for that?

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

AppDynamics is now part of Splunk Ideas

Advanced Splunk Data Management Strategies