I noticed sample command in Splunk is limited in how many parameters can be used at the same time:
https://docs.splunk.com/Documentation/MLApp/4.2.0/User/Customsearchcommands#sample
I am interested in replicating below functionality of numpy.random.choice library in python, here's an example of it's output:
>>>aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>>np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet']
So basically I would like to sample based on both "proportional" and "count", both at the same time. Has anyone come across this issue before and how did you work around it in SPL? Thank you.
Hi @cosminstefanmarin,
In your case if you want to use both proportional
and count
then you can chain both commands, starting with proportional so it makes sense to what you are trying to achieve.
... | sample proportional="some_field" | sample count=20
Since count is random and proportional isn't, starting with proportional then adding count should do the trick.
Let me know what you think.
Cheers,
David
Tried that already, doesn't provide the expected output.
I'll give you an example:
| sample proportional="some_field" generates random output, say 5
which means the immediate | sample count=20 won't be able to pull 20 events, because it doesn't make sense anymore, in this case it will be limited to only 5!!
Yeah you're right, and if you do it the other way around then it doesn't make sense at all...
The only way it would work is if your count is smaller than the total number returned by the proportional. But that makes sense doesn't it, If you get 5 that match with proportional than that's all you were going to get even if you had a count of 20 mixed with it.
Unless what you're trying to do is force the proportional to give more results than it ought to..then not sure what the point of proportional would be in the first place. Do you see my point ?
Reason for using proportional is to be able to give different probabilities to certain items, based on a baseline created on a longer period of time. At the same time I need count in order to sample different sizes based on "by" field clause.
Hello @cosminstefanmarin,
I'm not much sure about this but with MLApp you can try below:
| sample count=<value of count> proportional=<name of numeric field>
But as you can see you for proportional you need to give some field name which specify probability of that event. This gives you random count number of events and probability of the event to be selected will be taken from the given field. Compare to python array will be the Splunk events.
Hope this helps!!!
I am afraid using count and proportional at the same time is not allowed by the command itself. I already mentioned about it in the description. In my opinion this is the weakness of the command, and it should be dealt by Splunk as a feature enhancement.
I don't know when Splunk implements this but till then if you want you can create your own custom command with python and use the python function that you specified in the question. (You can put python libraries in bin directory of your App.)
I thought about it as well, will explore this in more detail. Another alternative would be to modify sample.py directly and introduce the missing functionality in the Splunk command itself. This can be a direct contribution to the community.
Yeah, I like your idea, that's great. You can introduce more arguments to sample.py file and change command logic accordingly.