Solved: Re: Counting events for all users in a lookup tabl...

gfisbeck · ‎05-03-2022

I have a lookup table that lists all users along with their department like so:

email              department
---------------------------------------
user1@company.com  Sales
user2@company.com  Engineering
user3@company.com  Accounting
user4@company.com  Sales
user5@company.com  HR

I also have an index that list events for a particular application. The index contains lots of fields, but for my purposes, I'm really only interested in _time and actor.email.

My goal is to count the number of days per week every user in a given department logs events in the index even if that number is zero.

I can get pretty close to what I want with this search:

index=whatever <base search here>
| lookup user.csv email as actor.email OUTPUT department
| bin _time span=1d
| search department="Sales"
| stats count as numEvents by _time, actor.email
| eval weekNumber = strftime(_time,"%U")
| stats dc(_time) as numDays by actor.email, weekNumber
| xyseries actor.email, weekNumber, numDays

The problem with this search is that if there is a user in the lookup table who returned zero events during that time frame, they won't appear in the results.

I considered trying to append [|inputlookup user.csv] to the search, but because my append doesn't include a _time field, I can't get everything to line up correctly.

How do I run a search for every user in the correct department in the lookup table and return zero events per week if they didn't interact with the system?

richgalloway · ‎05-04-2022

The filldown command fills-in null fields with the last non-null value. In the less-simplified case, filldown will populate with "HR". Work around that by sorting by actor.email before using filldown.

BTW, because filldown only fills in null fields, it should be used before fillnull or fillnull should only fill in fields other than department (using | fillnull weekNumber numDays).

---
If this reply helps you, Karma would be appreciated.

View solution in original post

gfisbeck · ‎05-04-2022

Thanks for the suggestions. Unfortunately both of these suggestions fail because of the time-series requirement (I'm trying to count the number of days in a given week that a user interacted with the system).

So when I add the append [|inputlookup ...] to the end of the search, the follow with a | stats max(*) as * by actor.email, _time, I get something like this:

actor.email          weekNumber         department          numDays
------------------------------------------------------------------------------
user1@company.com    0                  Sales               0
user1@company.com    15                 0                   4
user1@company.com    16                 0                   3
user1@company.com    17                 0                   3

Since I want to have the count by actor.email and weekNumber how can I apply the department tag if the inputlookup doesn't have a matching weekNumber for every possible row?

bowesmana · ‎05-04-2022

Did you try my suggestion?

Here's a working example that fills in values of 0 for users E and F and for user B in a couple of weeks.

It generates some dummy data with 800 events for each of users A-D. The lookup users.csv has 6 users A-F

| makeresults 
| eval email=split("ABCD", "")
| mvexpand email
| eval email=email."@company.com"
| eval r=mvrange(1,801,1)
| mvexpand r
| streamstats c by email
| eval _time=if(match(email,"^B"), case(r<200,now() - (c*60), r<300,now() - (c*3600), r<400,now() - (c*60), r<600,now() - (c*7200), 1==1, now() - 3600), now() - (c*3600))
| bin _time span=1d
| stats count as numEvents by _time, email
| eval weekNumber = strftime(_time,"%U")
| stats dc(_time) as numDays by email weekNumber
| xyseries email, weekNumber, numDays
| append [
  | inputlookup users.csv
  | fields email
]
| rename email as actor.email
| fillnull
| stats max(*) as * by actor.email
| lookup users.csv email as actor.email

Lookup file contains

email	dept
A@company.com	Marketing
B@company.com	Sales
C@company.com	Engineering
D@company.com	Accounts
E@company.com	Sales
F@company.com	Accounts

and search output gives something like

actor.email	11	12	13	14	15	16	17	18	dept
A@company.com	0	0	1	7	7	7	7	5	Marketing
B@company.com	4	7	7	0	0	2	4	1	Sales
C@company.com	0	0	1	7	7	7	7	5	Engineering
D@company.com	0	0	1	7	7	7	7	5	Accounts
E@company.com	0	0	0	0	0	0	0	0	Sales
F@company.com	0	0	0	0	0	0	0	0	Accounts

You can see that users E and F are included with 0 in all their weeks and user B has a couple of zeros in the middle and then weeks 11-11 are empty for A, C, D

You can see the partial results by looking at the table before the append is done

richgalloway · ‎05-04-2022

It's not pretty, but you could use filldown to populate the department field and then remove the row without a week number.

...
| filldown department
| where weekNumber > 0

---
If this reply helps you, Karma would be appreciated.

gfisbeck · ‎05-04-2022

Thanks, @richgalloway. Will filldown work if I have different department values? In my last example, I simplified the source table. if my source looks more like this:

actor.email            department         weekNumber      numDays
---------------------------------------------------------------------------
user1@company.com      Sales              0               0
user2@company.com      Engineering        0               0
user3@company.com      HR                 0               0
user1@company.com      0                  13              3
user2@company.com      0                  13              1
user3@company.com      0                  13              3
user1@company.com      0                  14              2
user2@company.com      0                  14              1
user3@company.com      0                  14              3
user1@company.com      0                  15              2
user2@company.com      0                  15              3
user3@company.com      0                  15              3

Is it possible to apply the "fill" command by field (like actor.email)? In other words, I want all user1@company.com events to have a department of "Sales".

richgalloway · ‎05-04-2022

The filldown command fills-in null fields with the last non-null value. In the less-simplified case, filldown will populate with "HR". Work around that by sorting by actor.email before using filldown.

BTW, because filldown only fills in null fields, it should be used before fillnull or fillnull should only fill in fields other than department (using | fillnull weekNumber numDays).

---
If this reply helps you, Karma would be appreciated.

bowesmana · ‎05-03-2022

Add this onto the end of the search

| append [
  | inputlookup users.csv where department="Sales"
  | fields email
  | rename email as "actor.email"
]
| fillnull
| stats max(*) as * by actor.email

Should do the trick

richgalloway · ‎05-03-2022

You're going to need that inputlookup command. Finding something that is not there is not Splunk's strong suit. See this blog entry for a good write-up on it.

https://www.duanewaddle.com/proving-a-negative/

---
If this reply helps you, Karma would be appreciated.

How to count events for all users in a lookup table, even if some users return no results?

fields

join

lookup

subsearch

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms