I have a system that tracks the status of various users. For example a user could be: In the office, Out of the office, On break, or Sleeping. My system generates a log anytime a status is changed and logs a bunch of data: time, user, status, etc…
Time, User, Status
9/28/2015 6:08:45, Sara, In The Office 9/28/2015 6:14:36, Mike, In The Office 9/28/2015 8:57:00, Jordan, In The Office 9/28/2015 10:30:07, Sara, On Break 9/28/2015 11:02:31, Sara, In The Office 9/28/2015 15:30:12, Mike, Out Of The Office 9/28/2015 16:48:02, Jordan, Out Of The Office
This is simple with this small log example, in reality I have ~100 users and ~12 different statuses.
How can I take this data, and create a graph that shows the number of people in a specific status (or multiple statuses) over a time window? Or answer “How many people were ‘In the Office’ at 10:23 AM?”
It is safe to assume that every user is “Out of the Office” at midnight which basically functions as a reset. Which is probably relevant to building a running count and allows me to start my search on any given day instead of needed to go back to “the beginning of time” to get accurate counts.
I will not be getting the logs streamed in real-time, I will get a one-time dump of the events each day. I am not sure if this is relevant to how it would work.
Thanks for any advice, guidance, or assistance.
Assuming you have a
user field and a
status field, start with a search like this (check out the
... | timechart span=1h dc(user) AS numUsersThisState BY status
How will this count the same user with multiple events showing different statuses within the span? Will it count for each? If not which one will it count?
9/28/2015 6:08:45, Sara, In The Office 9/28/2015 6:14:36, Sara, On Break 9/28/2015 6:21:00, Sara, Sleeping
It will count 1 for each state, mapped to the line for that state but only once (if he was in state 'Z' ten times from 10-11, he will be counted only once for state 'Z'.
This is a fun question. The answer touches on some things about the
timechart commands that are surprisingly complicated, but the good news is that those things are pretty key to understanding advanced ways to group your results in splunk. If you feel a little fuzzy about the difference between the
chart commands, maybe read this answer first.
A very simple search you might try at first would be:
<search terms> | timechart span=1min dc(User) by Status
As I think you've already discovered, these results don't incorporate the fact that the User remains in a Status until the next Status event, and as such... the results are pretty misleading.
So we need to kind of "fill in the last known value" of Status. Whenever you need that sort of thing, your two main tools are the
streamstats commands. Which one of the two you use basically depends on whether you're doing the core grouping with the
stats command on the one hand, or the
chart commands on the other hand. Here I'm going to use
timechart so this means
filldown is the right tool. Again, for more general reading about stats/chart, read this other post
So, filldown. Often you use
filldown with some explicit fields listed Here we actually want to filldown every value of "User" and as it happens we can use
filldown * to do that.
<search terms> | timechart span=1min last(Status) by User limit=500 | filldown *
The timechart command there outputs rows that are every one of the various 1min times in our timerange, and columns that are all the values of User. Then
filldown * takes timechart's output and smears the last-known value down each user's column, to reflect our reality that the user remains in the last state. (Since timechart's output happens to be sorted by _time in the right direction, it works out and we don't need a
A problem arises though. The leftmost points on this chart are going to be null, whereas we want them to say "out of office". Fixing this is a job for the
fillnull command. More commonly you see
fillnull someField value="foo" but here our fields are actually our username values and so we want it to fill null values in any of them, so here you won't specify any field.
<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office"
(NOTE: We accepted a bit of a contract here, to always run this search with a timerange that starts at midnight. If we were to break that assumption things would get weird)
OK. This gives you a nice visualization!
Now.... chances are that you want to do further statistics on this. What you need is a strong "base" search on which to do stuff. It turns out that
stats, or more generally "stats style" result rows, are a better base to build on.
So here we do something strange and we use the
untable command to convert from our nice easily-visualizable "chart style" rows, to the "stats style" rows.
<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status
This resultset will now have rows, such that for every User, there's a row for every 1 minute period in the timerange, and that User has an explicit value for "Status". It has no gaps and at 12am everyone starts at the "out of office" Status.
(here's that link again if you want to read more about key differences between stats and chart)
And from this as our sort of "base", we can now branch out and do lots of fun reporting, like
example 1) How many users were in each given state during the course of the day:
<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | timechart span=1min dc(User) by Status
example 2) Over the whole timerange, which user spent the least time in the "out of office" state.
<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | search Status="out of office" | stats count as minutes by User | sort + minutes
example 3) With each user in their own row, what is the total distribution of minutes spent in each state.
<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | chart count as minutes over User by Status
4) your crazy idea here....
Now, if you're even still reading this, you might ask "if we need to end up with "stats style" rows, why don't we just do those earlier "filling in blank values" tasks over in "stats style" rows rather than doing them on chart output and then doing this confusing conversion?" The answer is that sometimes it's just a lot easier or more effective to do some tasks on one side vs the other. Here in particular although you could in theory use
streamstats and other advanced trickery to compute our "last known" stuff in a stats-style set, it's just really really painful. The chart side in this particular case just does those tasks more easily. Take another look at Example 4, where we're actually then converting again back to a different chart-style set.