Not so much a question, but an answer to how I found a way to select random "foo" in Splunk.
your search for foo
| eval rnd=random()
| sort rnd
| streamstats count by foo
| where count <=10
| sort foo
| table time foo
This takes your search results, tacks on another field with a random number, sorts by that number (to randomise your results) - then using streamstats it adds a count per "foo" this could be a device, host, ticket number whatever. We then return where that count is <=10 (to get 10 foo), sort by that field to group them together - and viola - a random selection of 10 foo.
Hope someone finds it as useful as I have.
Cheers,
Ash
For a contiguous set of 10 events from a random place in the data you could do this:
index=_internal | head [tstats count where index=_internal | eval random = (random() % count) + 1 | return $random] | tail 10
The tstats
subsearch will count
the number of events and produce a random
value in between 1 and count inclusive. That's passed to head
, retrieving the first random
number of events... tail
keeps the last ten. Should be much faster than loading everything, streamstats
over everything, and sorting - but isn't doing quite the same thing, so both ways are valid.
They're two different use cases, yeah - just seemed to fit in with the topic 🙂
As for your tstats
, be wary of its syntax. It's slightly different and a bit tossed around compared to regular searching: http://docs.splunk.com/Documentation/Splunk/6.1.1/SearchReference/tstats
Also, make sure you look at the right version, it's one of the newer commands.
Thanks for the comments Martin, and certainly another approach for block of 10 events. I ran that, but received an error "Error in 'TsidxStats': Missing 'FROM' keyword to specify namespace"
I was aiming for 10 individual samples across the search, which is the reason for needing the random number against each event. In this scenario, there was only a few 1000 events, so streamstats did not cause too much of a perf hit - but i do get the point against a large set.