I have a simple query that counts the number of virus infected machines by different departments.
infection_status!=”infected” | stats count by department
HR – 10
IT – 2
FINANCE – 20
To achieve the above I have a lookup csv in place that links each machine name to a department. The csv looks sth. like this.
PC-1 - HR
PC-2 - IT
PC-3 - FINANCE
PC-4 – IT
I would like to put the count of infected assets in relation to the total number of known assets in each department. Hence I am looking for way to count the total number of known HR, IT, FINANCE machines (from the lookup csv) and use the results in my first query to get a infection rate, e.g.
HR – 10/100 = 10%
IT – 2/50 = 4%
FINANCE – 20/150 = 13,3%
try likr this:
| inputlookup "assets.csv" | stats count as total by Department | appendcols [ search ... infection_status!=”infected” | stats count by department]| eval percentage=(count/total)*100 + "%"
I think I solved it...If there is a smarter way, please do let me know.
The number of total assets per Department is stored in another lookup
| inputlookup "assetinfo.csv" | stats count As "TotalAssets" by Department | outputlookup "tmp_total_assets_by_Dept.csv"
In the main query I perform a lookup against the tmp lookup table...
infection_status!=”infected” | stats count As "InfectedAssets" by Department | lookup tmp_total_assets_by_Dept Department as Department OUTPUT "TotalAssets" | eval percentage=(InfectedAssets/TotalAssets)*100
,,,and than eval the two columns to get the percentage.
So it seems like you have both parts of the query worked out it is just a matter of joining them. What I would like to do is join department to the infected status (in that order) but what you really want (from a query layout perspective) is to list the asset.csv part first as you might have a department with no infections. When you work with joins or appends the items in the bracket search needs to finish in under a minute and/or have fewer than 50k results. Your queries list "department" with different case so I'll role with that.
.... ok so thinking this through ... I'd use an append vs join though you might have to monkey with the part of the query after the append as I don't have your data to work with and test 😃
infection_status=”infected” | stats count as infected_systems by department | rename department as Department | append [ | inputlookup "assets.csv" | stats count as department_systems by Department]
This will result in a somewhat odd looking data set where you have departments listed first, column two will be your infections, and column three will be blank for the first part of the result and then have values for the bottom part but column two will be empty. To fix this and to account for departments that might not have infections try this
infection_status=”infected” | stats count as infected_systems by department | rename department as Department | append [ | inputlookup "assets.csv" | stats count as department_systems by Department] | stats values(*) as * by Department | fillnull value=0 infected_systems | eval infection_perc = if(infected_systems=0 , 0, round((infected_systems / department_systems)*100)) | table Department department_systems infected_systems infection_perc | rename <fields as desired> | fieldformat infected_perc = infected_perc . " %"
Obviously you could set the field names as desired before you get to the table part and not have to do the rename bit. If you wanted to add the "%" sign I'd do that as a fieldformat vs eval because you might want to sort by departments with the highest infection percent. If you add the percent sign to the field in an eval that becomes a string vs int. I also just realized if there are no infected systems Splunk might bark at dividing a 0 so adjusted the eval with a simple if statement.
Once everything is good you can adjust the table line to just have department and infected percent OR you could add another table line at the end. When someone clicks into that dashboard item it should, in effect, drill into the dashboard and show more information: Department, number of systems in department, number of infected systems, infection %
Thank you for your help, but I am not sure if this is really it? Let me try to be a simple as possible…
If my first search returns total number of infected assets by Dept.)
**infection_status!=”infected” | stats count by department** HR – 10 IT – 2 FINANCE – 20
And the second search returns the total number of known assets by Dept.
| inputlookup "assets.csv" | stats count by Department HR – 100 IT – 50 FINANCE - 150
Then I just want to get a table showing the infection rate by Dept. (number of infected assets / number of known assets)
HR = 10% IT=4% FINANCE=13,3%
The lookup CSV ("assets.csv") is structured like this:
MachineID - Department PC1-HR PC2-IT PC3-FINANCE PC4–IT PC5-HR PC6-HR PC7-IT
I am able to calculate the total number of assets by department in the CSV using:
| inputlookup "assets.csv" | stats count by Department
which would return
HR – 3 IT – 3 FINANCE –1
But how do I take each of the values back to the main query?