Splunk Search

Finding random() events

ahartge
Path Finder

Not so much a question, but an answer to how I found a way to select random "foo" in Splunk.

your search for foo
| eval rnd=random()
| sort rnd
| streamstats count by foo
| where count <=10
| sort foo
| table time foo

This takes your search results, tacks on another field with a random number, sorts by that number (to randomise your results) - then using streamstats it adds a count per "foo" this could be a device, host, ticket number whatever. We then return where that count is <=10 (to get 10 foo), sort by that field to group them together - and viola - a random selection of 10 foo.

Hope someone finds it as useful as I have.

Cheers,
Ash

Tags (2)

martin_mueller
SplunkTrust
SplunkTrust

For a contiguous set of 10 events from a random place in the data you could do this:

index=_internal | head [tstats count where index=_internal | eval random = (random() % count) + 1 | return $random] | tail 10

The tstats subsearch will count the number of events and produce a random value in between 1 and count inclusive. That's passed to head, retrieving the first random number of events... tail keeps the last ten. Should be much faster than loading everything, streamstats over everything, and sorting - but isn't doing quite the same thing, so both ways are valid.

martin_mueller
SplunkTrust
SplunkTrust

They're two different use cases, yeah - just seemed to fit in with the topic 🙂

As for your tstats, be wary of its syntax. It's slightly different and a bit tossed around compared to regular searching: http://docs.splunk.com/Documentation/Splunk/6.1.1/SearchReference/tstats
Also, make sure you look at the right version, it's one of the newer commands.

0 Karma

ahartge
Path Finder

Thanks for the comments Martin, and certainly another approach for block of 10 events. I ran that, but received an error "Error in 'TsidxStats': Missing 'FROM' keyword to specify namespace"

I was aiming for 10 individual samples across the search, which is the reason for needing the random number against each event. In this scenario, there was only a few 1000 events, so streamstats did not cause too much of a perf hit - but i do get the point against a large set.

0 Karma
Get Updates on the Splunk Community!

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...