Splunk Search

How can I search and graph the number of people with a specific status (or multiple statuses) for a certain period of time?

JWBailey
Communicator

I have a system that tracks the status of various users. For example a user could be: In the office, Out of the office, On break, or Sleeping. My system generates a log anytime a status is changed and logs a bunch of data: time, user, status, etc…

Example Logs:
Time, User, Status

9/28/2015 6:08:45, Sara, In The Office
9/28/2015 6:14:36, Mike, In The Office
9/28/2015 8:57:00, Jordan, In The Office
9/28/2015 10:30:07, Sara, On Break
9/28/2015 11:02:31, Sara, In The Office
9/28/2015 15:30:12, Mike, Out Of The Office
9/28/2015 16:48:02, Jordan, Out Of The Office

This is simple with this small log example, in reality I have ~100 users and ~12 different statuses.

How can I take this data, and create a graph that shows the number of people in a specific status (or multiple statuses) over a time window? Or answer “How many people were ‘In the Office’ at 10:23 AM?”

It is safe to assume that every user is “Out of the Office” at midnight which basically functions as a reset. Which is probably relevant to building a running count and allows me to start my search on any given day instead of needed to go back to “the beginning of time” to get accurate counts.

I will not be getting the logs streamed in real-time, I will get a one-time dump of the events each day. I am not sure if this is relevant to how it would work.

Thanks for any advice, guidance, or assistance.

1 Solution

sideview
SplunkTrust
SplunkTrust

This is a fun question. The answer touches on some things about the stats, chart and timechart commands that are surprisingly complicated, but the good news is that those things are pretty key to understanding advanced ways to group your results in splunk. If you feel a little fuzzy about the difference between the stats and chart commands, maybe read this answer first.

A very simple search you might try at first would be:

 <search terms> | timechart span=1min dc(User) by Status

As I think you've already discovered, these results don't incorporate the fact that the User remains in a Status until the next Status event, and as such... the results are pretty misleading.

So we need to kind of "fill in the last known value" of Status. Whenever you need that sort of thing, your two main tools are the filldown and streamstats commands. Which one of the two you use basically depends on whether you're doing the core grouping with the stats command on the one hand, or the timechart/chart commands on the other hand. Here I'm going to use timechart so this means filldown is the right tool. Again, for more general reading about stats/chart, read this other post

So, filldown. Often you use filldown with some explicit fields listed Here we actually want to filldown every value of "User" and as it happens we can use filldown * to do that.

<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * 

The timechart command there outputs rows that are every one of the various 1min times in our timerange, and columns that are all the values of User. Then filldown * takes timechart's output and smears the last-known value down each user's column, to reflect our reality that the user remains in the last state. (Since timechart's output happens to be sorted by _time in the right direction, it works out and we don't need a reverse.)

A problem arises though. The leftmost points on this chart are going to be null, whereas we want them to say "out of office". Fixing this is a job for the fillnull command. More commonly you see fillnull someField value="foo" but here our fields are actually our username values and so we want it to fill null values in any of them, so here you won't specify any field.

<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office"

(NOTE: We accepted a bit of a contract here, to always run this search with a timerange that starts at midnight. If we were to break that assumption things would get weird)

OK. This gives you a nice visualization!

Now.... chances are that you want to do further statistics on this. What you need is a strong "base" search on which to do stuff. It turns out that stats, or more generally "stats style" result rows, are a better base to build on.

So here we do something strange and we use the untable command to convert from our nice easily-visualizable "chart style" rows, to the "stats style" rows.

<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status

This resultset will now have rows, such that for every User, there's a row for every 1 minute period in the timerange, and that User has an explicit value for "Status". It has no gaps and at 12am everyone starts at the "out of office" Status.

(here's that link again if you want to read more about key differences between stats and chart)

And from this as our sort of "base", we can now branch out and do lots of fun reporting, like

example 1) How many users were in each given state during the course of the day:

<search terms> | timechart span=1min  last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | timechart span=1min dc(User) by Status

example 2) Over the whole timerange, which user spent the least time in the "out of office" state.

<search terms> | timechart span=1min  last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | search Status="out of office" | stats count as minutes by User | sort + minutes

example 3) With each user in their own row, what is the total distribution of minutes spent in each state.

<search terms> | timechart span=1min  last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | chart count as minutes over User by Status

4) your crazy idea here....

Now, if you're even still reading this, you might ask "if we need to end up with "stats style" rows, why don't we just do those earlier "filling in blank values" tasks over in "stats style" rows rather than doing them on chart output and then doing this confusing conversion?" The answer is that sometimes it's just a lot easier or more effective to do some tasks on one side vs the other. Here in particular although you could in theory use makecontinuous and streamstats and other advanced trickery to compute our "last known" stuff in a stats-style set, it's just really really painful. The chart side in this particular case just does those tasks more easily. Take another look at Example 4, where we're actually then converting again back to a different chart-style set.

View solution in original post

sideview
SplunkTrust
SplunkTrust

This is a fun question. The answer touches on some things about the stats, chart and timechart commands that are surprisingly complicated, but the good news is that those things are pretty key to understanding advanced ways to group your results in splunk. If you feel a little fuzzy about the difference between the stats and chart commands, maybe read this answer first.

A very simple search you might try at first would be:

 <search terms> | timechart span=1min dc(User) by Status

As I think you've already discovered, these results don't incorporate the fact that the User remains in a Status until the next Status event, and as such... the results are pretty misleading.

So we need to kind of "fill in the last known value" of Status. Whenever you need that sort of thing, your two main tools are the filldown and streamstats commands. Which one of the two you use basically depends on whether you're doing the core grouping with the stats command on the one hand, or the timechart/chart commands on the other hand. Here I'm going to use timechart so this means filldown is the right tool. Again, for more general reading about stats/chart, read this other post

So, filldown. Often you use filldown with some explicit fields listed Here we actually want to filldown every value of "User" and as it happens we can use filldown * to do that.

<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * 

The timechart command there outputs rows that are every one of the various 1min times in our timerange, and columns that are all the values of User. Then filldown * takes timechart's output and smears the last-known value down each user's column, to reflect our reality that the user remains in the last state. (Since timechart's output happens to be sorted by _time in the right direction, it works out and we don't need a reverse.)

A problem arises though. The leftmost points on this chart are going to be null, whereas we want them to say "out of office". Fixing this is a job for the fillnull command. More commonly you see fillnull someField value="foo" but here our fields are actually our username values and so we want it to fill null values in any of them, so here you won't specify any field.

<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office"

(NOTE: We accepted a bit of a contract here, to always run this search with a timerange that starts at midnight. If we were to break that assumption things would get weird)

OK. This gives you a nice visualization!

Now.... chances are that you want to do further statistics on this. What you need is a strong "base" search on which to do stuff. It turns out that stats, or more generally "stats style" result rows, are a better base to build on.

So here we do something strange and we use the untable command to convert from our nice easily-visualizable "chart style" rows, to the "stats style" rows.

<search terms> | timechart span=1min last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status

This resultset will now have rows, such that for every User, there's a row for every 1 minute period in the timerange, and that User has an explicit value for "Status". It has no gaps and at 12am everyone starts at the "out of office" Status.

(here's that link again if you want to read more about key differences between stats and chart)

And from this as our sort of "base", we can now branch out and do lots of fun reporting, like

example 1) How many users were in each given state during the course of the day:

<search terms> | timechart span=1min  last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | timechart span=1min dc(User) by Status

example 2) Over the whole timerange, which user spent the least time in the "out of office" state.

<search terms> | timechart span=1min  last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | search Status="out of office" | stats count as minutes by User | sort + minutes

example 3) With each user in their own row, what is the total distribution of minutes spent in each state.

<search terms> | timechart span=1min  last(Status) by User limit=500 | filldown * | fillnull value="out of office" | untable _time User Status | chart count as minutes over User by Status

4) your crazy idea here....

Now, if you're even still reading this, you might ask "if we need to end up with "stats style" rows, why don't we just do those earlier "filling in blank values" tasks over in "stats style" rows rather than doing them on chart output and then doing this confusing conversion?" The answer is that sometimes it's just a lot easier or more effective to do some tasks on one side vs the other. Here in particular although you could in theory use makecontinuous and streamstats and other advanced trickery to compute our "last known" stuff in a stats-style set, it's just really really painful. The chart side in this particular case just does those tasks more easily. Take another look at Example 4, where we're actually then converting again back to a different chart-style set.

somesoni2
Revered Legend

Simply Amazing!!!

0 Karma

woodcock
Esteemed Legend

Assuming you have a user field and a status field, start with a search like this (check out the visualization tab):

... | timechart span=1h dc(user) AS numUsersThisState BY status

JWBailey
Communicator

How will this count the same user with multiple events showing different statuses within the span? Will it count for each? If not which one will it count?

 9/28/2015 6:08:45, Sara, In The Office
 9/28/2015 6:14:36, Sara, On Break
 9/28/2015 6:21:00, Sara, Sleeping
0 Karma

woodcock
Esteemed Legend

It will count 1 for each state, mapped to the line for that state but only once (if he was in state 'Z' ten times from 10-11, he will be counted only once for state 'Z'.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...