I have a lookup table that lists all users along with their department like so:
email department
---------------------------------------
user1@company.com Sales
user2@company.com Engineering
user3@company.com Accounting
user4@company.com Sales
user5@company.com HR
I also have an index that list events for a particular application. The index contains lots of fields, but for my purposes, I'm really only interested in _time and actor.email.
My goal is to count the number of days per week every user in a given department logs events in the index even if that number is zero.
I can get pretty close to what I want with this search:
index=whatever <base search here>
| lookup user.csv email as actor.email OUTPUT department
| bin _time span=1d
| search department="Sales"
| stats count as numEvents by _time, actor.email
| eval weekNumber = strftime(_time,"%U")
| stats dc(_time) as numDays by actor.email, weekNumber
| xyseries actor.email, weekNumber, numDays
The problem with this search is that if there is a user in the lookup table who returned zero events during that time frame, they won't appear in the results.
I considered trying to append [|inputlookup user.csv] to the search, but because my append doesn't include a _time field, I can't get everything to line up correctly.
How do I run a search for every user in the correct department in the lookup table and return zero events per week if they didn't interact with the system?
The filldown command fills-in null fields with the last non-null value. In the less-simplified case, filldown will populate with "HR". Work around that by sorting by actor.email before using filldown.
BTW, because filldown only fills in null fields, it should be used before fillnull or fillnull should only fill in fields other than department (using | fillnull weekNumber numDays).
Thanks for the suggestions. Unfortunately both of these suggestions fail because of the time-series requirement (I'm trying to count the number of days in a given week that a user interacted with the system).
So when I add the append [|inputlookup ...] to the end of the search, the follow with a | stats max(*) as * by actor.email, _time, I get something like this:
actor.email weekNumber department numDays
------------------------------------------------------------------------------
user1@company.com 0 Sales 0
user1@company.com 15 0 4
user1@company.com 16 0 3
user1@company.com 17 0 3
Since I want to have the count by actor.email and weekNumber how can I apply the department tag if the inputlookup doesn't have a matching weekNumber for every possible row?
Did you try my suggestion?
Here's a working example that fills in values of 0 for users E and F and for user B in a couple of weeks.
It generates some dummy data with 800 events for each of users A-D. The lookup users.csv has 6 users A-F
| makeresults
| eval email=split("ABCD", "")
| mvexpand email
| eval email=email."@company.com"
| eval r=mvrange(1,801,1)
| mvexpand r
| streamstats c by email
| eval _time=if(match(email,"^B"), case(r<200,now() - (c*60), r<300,now() - (c*3600), r<400,now() - (c*60), r<600,now() - (c*7200), 1==1, now() - 3600), now() - (c*3600))
| bin _time span=1d
| stats count as numEvents by _time, email
| eval weekNumber = strftime(_time,"%U")
| stats dc(_time) as numDays by email weekNumber
| xyseries email, weekNumber, numDays
| append [
| inputlookup users.csv
| fields email
]
| rename email as actor.email
| fillnull
| stats max(*) as * by actor.email
| lookup users.csv email as actor.email
Lookup file contains
email dept
A@company.com Marketing
B@company.com Sales
C@company.com Engineering
D@company.com Accounts
E@company.com Sales
F@company.com Accounts
and search output gives something like
actor.email 11 12 13 14 15 16 17 18 dept
A@company.com 0 0 1 7 7 7 7 5 Marketing
B@company.com 4 7 7 0 0 2 4 1 Sales
C@company.com 0 0 1 7 7 7 7 5 Engineering
D@company.com 0 0 1 7 7 7 7 5 Accounts
E@company.com 0 0 0 0 0 0 0 0 Sales
F@company.com 0 0 0 0 0 0 0 0 Accounts
You can see that users E and F are included with 0 in all their weeks and user B has a couple of zeros in the middle and then weeks 11-11 are empty for A, C, D
You can see the partial results by looking at the table before the append is done
It's not pretty, but you could use filldown to populate the department field and then remove the row without a week number.
...
| filldown department
| where weekNumber > 0
Thanks, @richgalloway. Will filldown work if I have different department values? In my last example, I simplified the source table. if my source looks more like this:
actor.email department weekNumber numDays
---------------------------------------------------------------------------
user1@company.com Sales 0 0
user2@company.com Engineering 0 0
user3@company.com HR 0 0
user1@company.com 0 13 3
user2@company.com 0 13 1
user3@company.com 0 13 3
user1@company.com 0 14 2
user2@company.com 0 14 1
user3@company.com 0 14 3
user1@company.com 0 15 2
user2@company.com 0 15 3
user3@company.com 0 15 3
Is it possible to apply the "fill" command by field (like actor.email)? In other words, I want all user1@company.com events to have a department of "Sales".
The filldown command fills-in null fields with the last non-null value. In the less-simplified case, filldown will populate with "HR". Work around that by sorting by actor.email before using filldown.
BTW, because filldown only fills in null fields, it should be used before fillnull or fillnull should only fill in fields other than department (using | fillnull weekNumber numDays).
Add this onto the end of the search
| append [
| inputlookup users.csv where department="Sales"
| fields email
| rename email as "actor.email"
]
| fillnull
| stats max(*) as * by actor.email
Should do the trick
You're going to need that inputlookup command. Finding something that is not there is not Splunk's strong suit. See this blog entry for a good write-up on it.
https://www.duanewaddle.com/proving-a-negative/