Splunk Search

Can a search detect it's own sample ratio?

Lowell
Super Champion

Is there a way for a search to determine its own sample ratio at search time?

This would be helpful when scaling results based on the sample ratio. For example, if a eventtype=myevent | stats count returns 57 at sample of 10:1, then I can estimate than a full search (no sample) would return a count of ~ 570.

Seems like the should be away to do this with some combination of addinfo and possibly rest...?

0 Karma
1 Solution

jconger
Splunk Employee
Splunk Employee

Something like this?

index=* 
    | addinfo 
    | stats values(info_sid) as sid 
    | join sid
        [| rest /services/search/jobs 
        | stats values(request.sample_ratio) AS request.sample_ratio by sid ]

View solution in original post

0 Karma

Lowell
Super Champion

Here's an expanded example that approaches the problem in a slightly different way. It only lookups up a single job via the REST request, but yet is has an extra subsearch (not sure the performance implications.)

| stats count 
| appendcols 
    [ rest splunk_server=local /services/search/jobs 
        [ makeresults 
        | addinfo 
        | eval sid=replace(info_sid, "^.*subsearch_(.*?\d+\.\d+)_.*$", "\1") 
        | eval search="search=sid=".sid 
        | table search ] 
    | table request.* ]

Note that the use of appendcols requires results not events as an input. (Think "Events" tab vs "Statistics" tab.) It's also possible to use append or (preferably) appendpipe but then you'll only have a single result at the end that contains the request.* fields, which is still workable in some situations, but a bit less ideal.

0 Karma

jconger
Splunk Employee
Splunk Employee

Something like this?

index=* 
    | addinfo 
    | stats values(info_sid) as sid 
    | join sid
        [| rest /services/search/jobs 
        | stats values(request.sample_ratio) AS request.sample_ratio by sid ]
0 Karma

Lowell
Super Champion

Well, I was really hoping for a hidden argument to addinfo that would give it to me. (Seems like the output of that command hasn't been updated since like the Splunk 3.x days, despite a massive increase in product functionality since then.)

But that works assuming you don't end up with more than 1000 search jobs. I'd probably add a splunk_server=local to the rest command.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...