Is there any way to enable event sampling in a search?
I know this can be enabled in a GUI using dropdown list under the searchbar but I need that functionality in a search statement.
I have seen some suggestions like adding "| eval rand=random() % 100 | where rand=0 " to the search but it doesn't work really fast.
The ultimate goal is to use custom sample rate in searches behind the dashboard that takes time range as an input argument and it can be few minutes to few weeks so it has big impact on execution time. I would like to give user an option to specify sample rate to speed up the search and adjust calculated statistics accordingly.
How about this (update
eval samplingperc to appropriate integer value. value 5 means it'll select 5% of the total events.)
your base search | eval sno=1 | accum sno | eval sno=sno%[| gentimes start=-1|eval samplingperc=5 | eval search=ceil(100/samplingperc) | table search] | where sno=0
Even though it samples the base query I don't think it improves performance at all, actually in terms of performance I don't think it's anything better than the solution with random() function.
Some solution that does not answer my question literally (it's not applying sample rate in a query) but completely suits my needs and I have to admit is completely obvious:
<fieldset submitButton="false"> ..... <input type="dropdown" token="sample_rate"> <label>Sample Rate</label> <choice value="1">1:1</choice> <choice value="10">1:10</choice> <choice value="100">1:100</choice> <choice value="1000">1:1000</choice> <choice value="10000">1:10000</choice> <fieldForLabel>sample_rate_label</fieldForLabel> <fieldForValue>sample_rate_value</fieldForValue> <default>1</default> </input> </fieldset> <query>index = netflow |lookups .... |eval bs=bytes * $sample_rate$ |stats sum(bs) by Country</query> <sampleRatio>$sample_rate$</sampleRatio>
Correct, this does not improve performance. Is Splunk planning on providing a true sampling function that is applied at the index level? With the amount of data being processed today, sampling is the only manageable way to produce metrics.