Splunk Search

Random sampling ratio in subsearch (long OR list) only?

alancalvitti
Path Finder

Is it possible, via Splunk's Python SDK, to specify event sampling ratio (say 1:1000) or some equivalent random evaluation in a subsearch which returns a long OR expression while specifying that the outer search does not sample?

For concreteness, the subsearch is:

[search index=my_index   |  rex "(?i)deviceId=(?P<DevId>[^ ]+)" | dedup DevId | return 1000000 $DevId]

This returns a long OR lists, each of which can match one or more events. It is critical to extract all events associated with the randomly sampled device.

0 Karma

DavidHourani
Super Champion

Hi @alancalvitti,

Event sampling applies on the result of your search. If you use a subsearch to generate an ORfilter the filter itself will not be subject to sampling but the result of the filtered search will be.

As mentioned here : https://docs.splunk.com/Documentation/Splunk/latest/Search/Retrieveasamplesetofevents

If a search matches 1,000,000 events when sampling is not used, using a sample ratio value of 100 would result in returning approximately 10,000 events.

So whatever you filter on in your search will be applied as is and then the sampling will take place.

Hope that helps.

Cheers,
David

0 Karma

alancalvitti
Path Finder

Thanks, but I need the logic the other way around: sampling (with specified ratio) in subsearch, and no sampling in outer search. Is there a way to emulate this behavior?

0 Karma

DavidHourani
Super Champion

You can use modulus to do so in your subsearch, making it look something like this :

[search index=my_index | rex "(?i)deviceId=(?P[^ ]+)"  | dedup DevId | streamstats count as sampler | eval sampler=sampler%5| where sampler=0 | return 1000000 $DevId]

This will use a fixed sampling rate of 20% (modulus 5).

0 Karma

alancalvitti
Path Finder

That's clever. That sampler strategy, coupled with the outer query to return events, seems to return reasonable results for short time spans, eg 1hour, but when increasing time range to, say 24hr, only a few events are matched (keeping sampler rate fixed at say sampler%1000). Any idea why?

0 Karma

DavidHourani
Super Champion

Could be that the subsearch is timing out and returning what it can after timeout, test how long the subsearch is taking by checking the job inspector or by running it seperately.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...