I have tried all of the examples but am still not getting accurate results. I have a lookup table with (1) column only, a list of hostnames under the heading "name". I have an index that is the output of an internal scanner with an event field "dvc". I need a search that will be used on a regular basis to compare the lookup table and scan results with the output being those hostname that appear in the scan list, but not in index during a particular period of time.
The problem is, with the search below, I get inaccurate results. What I get is the first host listed in the lookup table (in the uploaded CSV) and nothing more. What I expected as output was a much longer list of "dvc" that existed in both the lookup table, and the index. When I manually validate I find many records in the lookup table that ARE in the index.
The host that does appear, is accurate, but am missing many others that should be there. I am suspicious the output is stopping after the first lookup table entry. I checked the spelling of all of my hostnames in both the lookup table and the index, focusing on those that should be appearing. Spelling is exactly the same, including case.
index=scanner dvc=* [|inputlookup test_lookup.csv | fields name | rename name AS dvc] earliest="02/01/2017:00:00:01" latest="02/02/2017:23:59:59" | dedup dvc | table dvc, _time | sort dvc
First, if you want the records that are in the lookup but not in the index, then the lookup table has to be the driver, it cannot be a filter for the events.
Second, if you have a summary index that covers the events you want, and you don't need any event detail data, then use tstats against the summary index instead of using the detail index. I've assumed in the following code that no summary index is available.
Try this -
earliest="02/01/2017:00:00:01" latest="02/02/2017:23:59:59" index=scanner dvc=* | table _time dvc | stats max(_time) as latest, count as reccount by dvc | append [|inputlookup test_lookup.csv | table name | rename name AS dvc | eval reccount=0 ] | stats max(latest) as latest, sum(reccount) as reccount by dvc | where reccount=0
There is obviously something else going on that I am not seeing. Your query (which is appreciated) returns the same as mine, but the opposite: mine returns hosts that do NOT appear in the index and the lookup but with inaccurate results, yours returns hosts that DO appear in the index and the lookup but again with inaccurate results. Both are returning slightly different results
From both queries, I am receiving results that say "these host are not in both result sets (the index & lookup), yet when I look for the host in only one source at a time using the same parameters (field = dvc), the reported missing hosts are clearly there.
Okay, let's see what is going on here.
First, remove the last line, so all the hosts in the lookup and the index will be there. Hmm. let's set a flag for whether it was on the lookup as well.
earliest="02/01/2017:00:00:01" latest="02/02/2017:23:59:59" index=scanner dvc=* | table _time dvc | stats max(_time) as latest, count as reccount by dvc | eval lookcount =0 | append [|inputlookup test_lookup.csv | table name | rename name AS dvc | eval reccount=0 | eval lookcount=1 | stats sum(reccount) as reccount, sum(lookcount) as lookcount by dvc] | stats max(latest) as latest, sum(reccount) as reccount, sum(lookcount) as lookcount by dvc
Edited to do "stats" on the lookup table in case we had multiples.
Using lookups as a subsearch filter works. It can be limiting if the table is large. Granted it might not be in your case but this is a handy pattern. change isnull to isnotnull if you want to flip the results.
index=scanner | stats count by dvc | lookup test_lookup.csv name AS dvc OUTPUTNEW name as isFound | where isnull(isFound)
I was hopeful (and excited) that I could put this behind me, but.... I ran the query as written with one addition: just after index=scanner I included "earliest & latest" to keep the known hosts the same for all testing. Looked good. For isnull output of 124 hosts. For isnotnull output of 1, There is a total of 135 test records? I randomly compared the isnull output to the lookup file (which only has one column) and most identified as isnull were in both the lookup and the events 😞
The discrepancy has been driving me batty all day. The lookup file has only a single column of hostnames (dvc), with the exact spelling as those records that continue to show up in my random checks against the events. Missing something not so obvious?
Going to export everything into a standalone installation and hopefully the problem surface
I see this is a similar, but not the same. I cannot use metadata however, but only an index.
The goal is to see what fqdn values are in the CSV, but not the index.
This gives me the reverse of what i need - it gives me what's only in the index. I need what is NOT in the index, but IS in the CSV.
index=securityidx | rex field=fqdn "(?[^.]*)" | stats count by fqdn index | where NOT [inputlookup securitymanaged.csv | eval fqdn=lower(fqdn) | table fqdn]
This just merges the two fqdn fields together in the outputlookup, so I get all the values, not just what is missing from the index.
| inputlookup securitymanaged.csv | eval fqdn=lower(fqdn) | where NOT [search index=securityav | rex field=fqdn "(?[^.]*)" | eval fqdn=lower(fqdn)]