Solved: How to count the number of occurrences of a string...

pfhendr · ‎02-09-2018

I have a team of Unix admins, each of which manages multiple applications. I created a CSV lookup file that contains the name of the application, the primary admin responsible for it and one or more backup admins in case the primary is unavailable. It is very useful to keep track of who is doing what, and who to assign a new app to when a new one comes in.

I'm able to create a simple bar chart showing the number of apps assigned to each primary admin, but I'm not having any luck in getting the number of admins that are a backup to each project.

Here is an example of my CSV file:

Application,Primary,Backup1,Backup2
App1,Tom,Dick,Harry
App2,Dick,Tom,
App3,Jim,Harry,Fred
App4,Fred,Tom,
App5,Tom,Fred,Harry

What I would like to see is a list of each admin and a count of apps each one is a primary for plus a count of apps they are a backup for, something like:

Admin    Primary  Backup
Tom       2         2
Dick      1         1
Harry     0         3
Fred      1         2

It's pretty easy to get a count of apps each one is a primary for:

| inputlookup AdminAppSupport.csv| stats count as "Primary Apps" by Primary
| sort -"Primary Apps"

But all my attempts to do a count of admins that are in Backup1 or Backup2 fail.

Any pointers would be greatly appreciated.

elliotproebstel · ‎02-09-2018

There has to be a cleaner way to do this, but this at least gets the job done:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 
| where isnull(Application) 
| eval Admin=coalesce(Primary, Backup1, Backup2) 
| stats max(*_Apps) AS * BY Admin 
| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 
| fillnull value=0

View solution in original post

elliotproebstel · ‎02-09-2018

There has to be a cleaner way to do this, but this at least gets the job done:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2] 
| where isnull(Application) 
| eval Admin=coalesce(Primary, Backup1, Backup2) 
| stats max(*_Apps) AS * BY Admin 
| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2 
| fillnull value=0

pfhendr · ‎02-09-2018

Thanks! That gives me exactly what I'm looking for. Now to digest how it's doing it. 🙂

elliotproebstel · ‎02-09-2018

The best way to peek under the hood would be to start by running this chunk:

| inputlookup AdminAppSupport.csv 
| appendpipe 
    [| stats count AS Primary_Apps BY Primary ] 
| appendpipe 
    [| stats count AS Backup1_Apps BY Backup1] 
| appendpipe 
    [| stats count AS Backup2_Apps BY Backup2]

The three appendpipe commands are generating the stats per category. The table is all a mess, but the data is there in the rows that don't contain a value for Application. That leads us into the next line of code, where we remove all the lines with values for Application:

| where isnull(Application)

Now we need a list of all unique Admin names. Given that each row is guaranteed to only contain one Admin's name (regardless of whether they were Primary, Backup1, or Backup2), we use coalesce to grab whichever value is not null:

| eval Admin=coalesce(Primary, Backup1, Backup2)

Next, we need to make this table more readable by pushing it together so that each Admin has a single row, containing the max value from each category (P, B1, B2). We could just as easily have chosen the min value, since there will only be one value - but stats requires some kind of function, and I chose this one.

| stats max(*_Apps) AS * BY Admin

But you wanted to combine the Backup1 and Backup2 counts, so we need to add them together into a single Backup field and remove the B1 and B2 fields:

| eval Backup=Backup1+Backup2 
| fields - Backup1 Backup2

Finally, we add in a fillnull so that the table is nicely populated and not full of blank spots.

| fillnull value=0

I hope this helps your digestion. 🙂

pfhendr · ‎02-09-2018

When I used this on my actual csv file I found that some of the counts for the Backup field wasn't what I expected. I eventually moved the fillnull value=0 line to just above the stats max() line, now it's working as expected. I'm guessing the stats command couldn't handle the null values.

elliotproebstel · ‎02-09-2018

Ahh, good catch!

pfhendr · ‎02-09-2018

Yep, thanks, that makes it much clearer!

How to count the number of occurrences of a string in multiple fields?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How to count the number of occurrences of a string in multiple fields?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits