Splunk Search

Counting Totals in CSV Lookup and return results to query

Explorer

Hi,
I have a simple query that counts the number of virus infected machines by different departments.

infection_status!=”infected” | stats count by department

resulting in:
HR – 10
IT – 2
FINANCE – 20

To achieve the above I have a lookup csv in place that links each machine name to a department. The csv looks sth. like this.
PC-1 - HR
PC-2 - IT
PC-3 - FINANCE
PC-4 – IT

I would like to put the count of infected assets in relation to the total number of known assets in each department. Hence I am looking for way to count the total number of known HR, IT, FINANCE machines (from the lookup csv) and use the results in my first query to get a infection rate, e.g.
HR – 10/100 = 10%
IT – 2/50 = 4%
FINANCE – 20/150 = 13,3%

Tags (3)
0 Karma

Motivator

try likr this:

| inputlookup "assets.csv" | stats count as total by Department | appendcols [ search  ... infection_status!=”infected” | stats count by department]| eval percentage=(count/total)*100 + "%"

Explorer

Thx fdi01!
This is much better.

0 Karma

Motivator

ok it is good for you Mr shakermaker,
u can up vote this answer.

0 Karma

Explorer

I think I solved it...If there is a smarter way, please do let me know.

The number of total assets per Department is stored in another lookup

| inputlookup "assetinfo.csv" | stats count As "TotalAssets" by Department | outputlookup "tmp_total_assets_by_Dept.csv"

In the main query I perform a lookup against the tmp lookup table...

infection_status!=”infected” | stats count As "InfectedAssets" by Department | lookup tmp_total_assets_by_Dept Department as Department OUTPUT "TotalAssets" | eval percentage=(InfectedAssets/TotalAssets)*100

,,,and than eval the two columns to get the percentage.

Motivator

So it seems like you have both parts of the query worked out it is just a matter of joining them. What I would like to do is join department to the infected status (in that order) but what you really want (from a query layout perspective) is to list the asset.csv part first as you might have a department with no infections. When you work with joins or appends the items in the bracket search needs to finish in under a minute and/or have fewer than 50k results. Your queries list "department" with different case so I'll role with that.

.... ok so thinking this through ... I'd use an append vs join though you might have to monkey with the part of the query after the append as I don't have your data to work with and test 😃

infection_status=”infected” | stats count as infected_systems by department | rename department as Department | append [ | inputlookup "assets.csv" | stats count as department_systems by Department]

This will result in a somewhat odd looking data set where you have departments listed first, column two will be your infections, and column three will be blank for the first part of the result and then have values for the bottom part but column two will be empty. To fix this and to account for departments that might not have infections try this

infection_status=”infected” | stats count as infected_systems by department | rename department as Department | append [ | inputlookup "assets.csv" | stats count as department_systems by Department] | stats values(*) as * by Department | fillnull value=0 infected_systems | eval infection_perc = if(infected_systems=0 , 0, round((infected_systems / department_systems)*100)) | table Department department_systems infected_systems infection_perc | rename <fields as desired> | fieldformat infected_perc = infected_perc . " %"

Obviously you could set the field names as desired before you get to the table part and not have to do the rename bit. If you wanted to add the "%" sign I'd do that as a fieldformat vs eval because you might want to sort by departments with the highest infection percent. If you add the percent sign to the field in an eval that becomes a string vs int. I also just realized if there are no infected systems Splunk might bark at dividing a 0 so adjusted the eval with a simple if statement.

Once everything is good you can adjust the table line to just have department and infected percent OR you could add another table line at the end. When someone clicks into that dashboard item it should, in effect, drill into the dashboard and show more information: Department, number of systems in department, number of infected systems, infection %

0 Karma

Explorer

I tried to run your proposal but received the following error.
Error in 'inputlookup' command: This command must be the first command of a search.

Did it work on your end?

0 Karma

Motivator

oh yeah - just take "search" out before the pipe of inputlookup. Will edit it out.

0 Karma

Path Finder

you have to use rex command to extract the fields: MachineID and Department.

    | inputlookup "assets.csv" |rex "(?P< MachineID>\w+\d)-(?P<Departement>\w+)"| stats count by Department
0 Karma

Explorer

Thank you for your help, but I am not sure if this is really it? Let me try to be a simple as possible…
If my first search returns total number of infected assets by Dept.)

**infection_status!=”infected” | stats count by department**
HR – 10
IT – 2
FINANCE – 20

And the second search returns the total number of known assets by Dept.

 | inputlookup "assets.csv" | stats count by Department
    HR – 100
    IT – 50
    FINANCE - 150

Then I just want to get a table showing the infection rate by Dept. (number of infected assets / number of known assets)

HR = 10%
IT=4%
FINANCE=13,3%
0 Karma

Explorer

The lookup CSV ("assets.csv") is structured like this:

MachineID - Department
PC1-HR
PC2-IT
PC3-FINANCE
PC4–IT
PC5-HR
PC6-HR
PC7-IT

I am able to calculate the total number of assets by department in the CSV using:

| inputlookup "assets.csv" | stats count by Department

which would return

HR – 3
IT – 3
FINANCE –1

But how do I take each of the values back to the main query?

0 Karma

Path Finder

can i see a sample date for your lookup csv?

0 Karma