I'm trying to create a search to determine which hosts in a CSV file don't have any events associated with it within Splunk.
Essentially what I'm trying to do is target these specific hosts contained within this CSV file to determine the hosts that haven't had any vulnerability scans run on them in the last 30 days.
I already have a functioning inputlookup subsearch which pulls the hostname field from the csv and renames it as Hostname to match the event field in splunk.
From there I run the stats command to determine the latest time a log has come in (end_time) rename it lastTime and group by Hostname. Then run a few calculations to determine the duration between now and the last reported time, and finally run a search to display records that only have a difference greater than or equal to 30 days and display the results in a table with columns Hostname and "Last Reported".
[inputlookup hosts.csv | fields + hostname | rename hostname as Hostname] index=endpoint sourcetype=qualys | stats max(end_time) as lastTime by Hostname | convert timeformat="%Y-%m-%d %H:%M:%S" mktime(lastTime) | eval timeDiff=now()-lastTime | eval hourDiff=timeDiff/3600 | eval dayDiff=hourDiff/24 | convert ctime(lastTime) | sort -dayDiff | search dayDiff>=30 | rename lastTime AS "Last Reported" | table Hostname, "Last Reported"
Ideally, I'd like to be able to see a 1 to 1 association with all of the hosts in my csv file. If i have 50 records in a CSV file, my search should also return 50 results.
My problem however, is that as expected, the results are only returning records that have events associated with them. Meaning that I'm not reporting on hosts which have never been scanned.
Is there any way to return all of the hosts listed from my csv file and if there is no event record simply set the Last Reported date in my results table to NULL?
Thanks so much!
I would recommend doing this with a simple metadata search and a lookup file of expected hosts.
## transforms.conf [expected_host_lookup] filename = expected_hosts.csv ## expected_hosts.csv host,is_expected hostA,true hostB,true search: | metadata type=hosts | append[inputlookup expected_host_lookup | eval lastTime=0 | eval totalCount=0] | stats max(lastTime) as lastTime, sum(totalCount) as totalCount by host | lookup expected_host_lookup host OUTPUT is_expected | search is_expected=true totalCount=0
This search is extremely cheap since we are using metadata and a few simple lookups. The first inputlookup appends the expected hosts table and is then consolidated by stats. We do a subsequent lookup to get the is_expected values again. Finally, we search for expected hosts w/ a totalCount of 0. You could also perform a query based on lastTime if you are interested in hosts that haven't reported in recently....
Hmm. I think using the
join command may work better for you here. I'm assuming that your csv file and events both have a field called 'Hostname' that you are using to correlate between your two sources.
See if something like this works to pull your events together:
index=endpoint sourcetype=qualys | stats max(end_time) as lastTime by Hostname | join type=outer Hostname [ inputlookup hosts.csv | fields hostname | rename hostname as Hostname ] | ....
Without seeing your actual search it's hard to say what's going on. Go ahead and post your new search at the end of your question. (Click the "edit" link below your question.) From there myself or someone else may be able to give some additional suggestions. Keep in mind that the sub-search (the search in
) does have have a restricted number of results that it will return. You may find that a simple
|dedup hostname will get the job done.
Awesome. I think you got me on the right track.
I actually needed to flip the inputlookup and the rest of the search around though to work correctly with the outer join.
Since an outer join retains events that don't join with any of the rows in the subsearch, I needed the inputlookup as my primary and index=endpoint sourcetype=qualys as my subsearch. I want to show hosts in the csv with no splunk events.
It looks like it's working as expected, but it takes forever to run. I actually think it's timing out, as the last 5 hosts in the csv aren't being returned. Any thoughts?