I am new to Splunk but own a system that uses Splunk as the backend. I want to create a query that only gives me a specific percentage of the possible results that I can then leverage for a phased deployment.
For example, I have 10,000 endpoints reporting in but I want to create a query that gives me a random 50% (5,000) of those endpoints as a result. I can then use that query to target a deployment to the company in two phases.
There is a random number generator in splunk. You use the mod function (%
) to turn it to whatever set of numbers you want.
In this case, you could...
1) Create a lookup file (for instance mySystems.csv) that includes ALL the systems.
2) assign each system a random number from 1 to 100
| inputlookup mySystems.csv
| eval myGroup=1+ (random() % 100)
| outputlookup append=f mySystems.csv
3) If new Systems get added, append them with another number (101, 102).
(your search that finds all systems)
| lookup host mySystems.csv OUTPUT myGroup
| eventstats max(myGroup) as maxGroup
| where isnull(myGroup)
| eval myGroup=maxGroup+1
| fields - maxGroup
| outputlookup append=tf mySystems.csv
4) Each time you do something to a set, select a non-overlapping myGroup number range that you havene't done it to before.