I have a query that shows observed category of domains (search engines, social media, streaming, etc.). I'd like to compare this to a lookup table that lists ALL possible categories. From here I'd like to finally generate a table showing the categories that are on the lookup table but NOT observed in the initial query.
Basically, generate a table that shows all unobserved web categories.
Observed Categories
index=web | dedup category | table category
Lookup Table
| inputlookup=all_categories.csv
Current Query
index=web | dedup category | table category | sort +category | join type=outer [inputlookup all_categories.csv]
I tried running a query for yesterday and I'm getting the same results for a query of observed yesterday and my lookup table comparison query (Current Query).
Try like this. This should give you all the categories which are in lookup but not returned by index=web query)
index=web | stats count by category | table category | eval Observed=1
| append [|inputlookup all_categories.csv | table category | eval Observed=0 ]
| stats max(Observed) as Observed by category | where Observed=0
Try like this. This should give you all the categories which are in lookup but not returned by index=web query)
index=web | stats count by category | table category | eval Observed=1
| append [|inputlookup all_categories.csv | table category | eval Observed=0 ]
| stats max(Observed) as Observed by category | where Observed=0
This works exactly how I wanted! Thank you!
Would you mind explaining how the Observed works in this query? I understand the logic of the query, but I'm confused how you can eval the Observed and somehow the lookup table assigned the '0' to each category while the first line associates the '1' value to the categories.
Is it basically making a second column called "Observed" where the value is '1' for the first part and '0' for the append portion? The stats max is also a bit confusing because if Observed = '1' then wouldn't it just return nothing since the lookup table made every category = '0' ? I'm sorry for all the questions, I'm trying to learn.
So you get a result set (list of categories) from index=web. We added a new field Observed with value 1 so that all categories that appeared in index=web will have Observed=1 (or true). Using numeric value for easier comparison. The we append 2nd result set, which is all categories from your lookup with a field Observed with value 0 (say Observed=0 means they are from Lookup table only). Since we append two result sets, there can be two entries for a category (one from index=web and one from lookup) so we add the stats command. The final stats checks what's the max value of Observed column. So if
1) Observed=1, means the category was available in index=web. It may or may not be available in lookup but since you say your lookup contains all the categories, Observed=1 means the category appeared in both index=web and lookup table.
2) Observed=0, means the category only appeared in lookup table.
This makes much more sense to me now. I was thinking it made two sets of lists, web with 1 and all with 0, but with max, it will sort and show only what I want. Again, thank you for further elaborating.