 
					
				
		
I am new to Splunk but own a system that uses Splunk as the backend. I want to create a query that only gives me a specific percentage of the possible results that I can then leverage for a phased deployment.
For example, I have 10,000 endpoints reporting in but I want to create a query that gives me a random 50% (5,000) of those endpoints as a result. I can then use that query to target a deployment to the company in two phases.
 
					
				
		
There is a random number generator in splunk.  You use the mod function (%) to turn it to whatever set of numbers you want.
In this case, you could...
1) Create a lookup file (for instance mySystems.csv) that includes ALL the systems.
2) assign each system a random number from 1 to 100
  | inputlookup mySystems.csv
  | eval myGroup=1+ (random() % 100)
  | outputlookup append=f mySystems.csv
3) If new Systems get added, append them with another number (101, 102).
 (your search that finds all systems)
  | lookup host mySystems.csv OUTPUT myGroup
  | eventstats max(myGroup) as maxGroup
  | where isnull(myGroup)
  | eval myGroup=maxGroup+1
  | fields - maxGroup
  | outputlookup append=tf mySystems.csv
4) Each time you do something to a set, select a non-overlapping myGroup number range that you havene't done it to before.
