Splunk Search

Find a distinct count using a priority system.

Communicator
:: my search :: | stats count dc(player) by result 

Let’s say the result field has two possible values, Win and Lose.

This search will give me the number of unique players for each value of result. So, a count of players that won at least once, and a count of players that lost at least once. So if a player both wins and losses (in different events), they would be counted in both totals.

What I really need, is a count of players that have won at least once, and a count of players that never won. So if a player tried 10 times, loses his first 9, but wins the 10th…. I want that to increment the Win count by one, and not affect the Lose count. Obviously if the player only has 1 result, it will increment the appropriate count.

And getting more complicated, what if the result field had three values? Win, Lose, and Tie, and I wanted to apply the same general idea. So Win > Tie > Loss. So as soon as a player has a result that is ranked higher, they will increment the count of this higher result.

How can I build a search to tally my events in this way?

Thanks.

0 Karma
1 Solution

Influencer

How about this:

:: your search :: | chart count over player by result | eval result=case(Win>=1,"Win",Tie>=1,"Tie",1==1,"Lose") | stats count as num_players by result

Assuming result can only have the values "Win" "Tie" and "Lose" of course. We get the stats for each player, then we figure out which one each counts toward, and then build stats of how many players count to each.

If you also want the count of events with each of these results in addition to the players at each... so far what i've come up with is a bit complicated:

:: your search :: | chart count over player by result | eval result=case(Win>=1,"Win",Tie>=1,"Tie",1==1,"Lose") | appendpipe [stats count as num_players by result | eval keep=1] | appendpipe [where NOT keep=1 | fields - result num_players keep | untable player result count | where count > 0 | eval keep=1] | where keep=1 | stats sum(count) as events first(num_players) as num_players by result

Also just in case you were curious this is what I've been using as :: your search :: to try things out:

| gentimes start=-2 increment=1h | rename starttime as _time | fields | eval player="player"+tostring(floor((random()/2147483648)*100)+1) | eval win=floor((random()/2147483648)*3) | eval result= case(win==0,"Lose",win==1,"Tie",win==2,"Win") 

Update per comment:

Assuming result takes the values 2,1,0 and that 2 means "win", 1 means "tie" and 0 means "loss" (i.e. already priority order). then the simple process still follows basically the same process, except if you are only interested in the priority you don't need the full stats just a simple max:

 :: your search :: | stats max(result) as result by player | stats count as num_players by result

Doing the count of events in addition to the players of each becomes much easier as well:

:: your search :: | stats count by player, result | eventstats max(result) as player_result by player | stats sum(count) as events count(eval(player_result==result)) as num_players by result 

View solution in original post

0 Karma

Influencer

In answering your follow up question I felt like this had some strange machinations of its own, so it deserved it's own answer. (vote up if it's useful, or if mods can take your follow up and this answer and split them out into a new question, that'd be cool too).

The goal being a timechart of the number of players in a particular tier, making sure to remove them from their last tier when they bump up tiers. First the solution I came up with:

:: your search :: 
| streamstats max(result) as tier by player
| dedup player, tier
| eval add = 1
| streamstats current=false last(tier) as remove by player 
| appendpipe [search remove=* | rename remove as tier | eval add = -1] 
| sort count=0 _time 
| streamstats sum(add) as in_tier by tier 
| timechart last(in_tier) by tier 
| streamstats last(*) as *

So let's break this down.

| streamstats max(result) as tier by player | dedup player, tier - Find the player's tier for every record, and limit only to the times the tier changes
| eval add = 1 - for every result we have now, we want to add one to the tier.
| streamstats current=false last(tier) as remove by player - at every step, find the last tier the player was in if any
| appendpipe [search remove=* | rename remove as tier | eval add = -1] - for every previous tier, add a result for the same time, for the previous tier with the add being a decrement instead of an increment.
| sort count=0 _time - put the decrementing results in the proper time series order
| streamstats sum(add) as in_tier by tier - calculate the number currently in each tier as each record comes in
| timechart last(in_tier) by tier - for every time segment, the number currently in the tier is the last value we saw for a number in the tier for any given tier
| streamstats last(*) as * - fill in gaps where a a particular time segment has no change for a particular tier

and voila!

Pretty timechart!!!

Now dear reader, I'll leave it as an exercise to you to figure out the best way to start each tier in your search with a particular non-zero value. (but you probably want to insert events at the same time as or just before the first event or the beginning of your search window with tier and add fields specified before calculating the number in a particular tier).

0 Karma

Influencer

How about this:

:: your search :: | chart count over player by result | eval result=case(Win>=1,"Win",Tie>=1,"Tie",1==1,"Lose") | stats count as num_players by result

Assuming result can only have the values "Win" "Tie" and "Lose" of course. We get the stats for each player, then we figure out which one each counts toward, and then build stats of how many players count to each.

If you also want the count of events with each of these results in addition to the players at each... so far what i've come up with is a bit complicated:

:: your search :: | chart count over player by result | eval result=case(Win>=1,"Win",Tie>=1,"Tie",1==1,"Lose") | appendpipe [stats count as num_players by result | eval keep=1] | appendpipe [where NOT keep=1 | fields - result num_players keep | untable player result count | where count > 0 | eval keep=1] | where keep=1 | stats sum(count) as events first(num_players) as num_players by result

Also just in case you were curious this is what I've been using as :: your search :: to try things out:

| gentimes start=-2 increment=1h | rename starttime as _time | fields | eval player="player"+tostring(floor((random()/2147483648)*100)+1) | eval win=floor((random()/2147483648)*3) | eval result= case(win==0,"Lose",win==1,"Tie",win==2,"Win") 

Update per comment:

Assuming result takes the values 2,1,0 and that 2 means "win", 1 means "tie" and 0 means "loss" (i.e. already priority order). then the simple process still follows basically the same process, except if you are only interested in the priority you don't need the full stats just a simple max:

 :: your search :: | stats max(result) as result by player | stats count as num_players by result

Doing the count of events in addition to the players of each becomes much easier as well:

:: your search :: | stats count by player, result | eventstats max(result) as player_result by player | stats sum(count) as events count(eval(player_result==result)) as num_players by result 

View solution in original post

0 Karma

Communicator

Follow up question:

How can I get this data broken down over a time window? This search gives me the entire time window in 1 set of data, if I wanted to break it down by day, or hour for example.

When I try to use timechart I get no results found. I assume this is caused by the stats command not making the _time field accessible anymore?

0 Karma

Influencer

That's correct. stats is removing the _time field. It'd be quite a bit of a different search, involving likely a dedup, maybe bin as well.

0 Karma

Communicator

Yes, this is great!

One other question. I realized my example might not be a perfect representation of what I am trying to do. In my real data, the values of “result” are not text strings. How would it effect this if instead of “win” and “loss” it was numbers? So now result could be a 2, 1, or 0.

It seems like this falls short in the case function. How do I reference the number of twos or ones identified in the chart command?

I know I could use eval and if statements to get a field with text values in it, but I don’t want to add extra work to the search if I don’t need to.

0 Karma

Influencer

Updated answer per comment 🙂 it actually seems to become much easier if you already have priority numbers.

0 Karma

Communicator

Yes, this is exactly it. Thank you.

Sorry for the misrepresentation of my question resulting in a more complex answer. I do appreciate knowing how to do it each way now...

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!