Solved: Compare search to lookup table and return results ...

hlarimer · ‎10-23-2015

I have a search that tells me when a system doesn't report into splunk after a threshold of an hour:

Then I have a lookup csv that is driven by a dashboard that allows users to enter systems that are in maintenance mode or decommissioned. If a system hits the threshold from the first search but is listed in the csv I don't want it to fire the alert. The lookup table is called decom_maint_systems.csv.

I can get the correct hostname by using set diff but the problem is I lose the "last_seen" field which I really want to keep. So how do I compare the two while keeping the fields I want from the initial search?

acharlieh · ‎10-23-2015

As we discussed on the Splunk Usergroups Slack (signup request form) one option would be to use the subsearch retrieving the hosts in maintenance mode and inject that into a where clause... e.g.

| metadata ... | ... | where NOT [inputlookup ... | table host]

View solution in original post

woodcock · ‎10-23-2015

Like this:

|metadata index=vmware type=hosts | eval timenow=now() | eval lastseen=timenow-recentTime | where lastseen > 3600 | eval last_seen=tostring(lastseen, "duration") | eval type="search" | table host last_seen type| appendpipe [|inputlookup decom_main_systems.csv] | eval type=coalesce(type, "inputlookup") | stats dc(type) AS numTypes* AS * by host

This gives the fully merged set (full join). For each function, tack on the appropriate remaining search string:

For your case:

| where numTypes=1 AND type="search"

For XOR (outer join):

| where numTypes= 1

For left join:

| where type="search"

For right join:

| where type="inputlookup"

For inner join:

| where numTypes>1

acharlieh · ‎10-23-2015

Your setting of type for the inputlookup should then be a simple eval inside the subsearch of append, and you need to use a statistical function for all other fields... (e.g. stats values(*) as * by host ) But then you can pipe to a where command as you mention.

woodcock · ‎10-23-2015

Until recently, I would have given the same advice but using append incurs severe number-of-event limits which can be avoided by using appendcols instead. See this interesting Q&A for details:

https://answers.splunk.com/answers/318428/how-can-i-append-and-escape-the-50k-subsearch-limi.html

Given this, I stand by my answer as-is.

acharlieh · ‎10-23-2015

Did you know that in your link you're using appendpipe in all your searches, but drawing a conclusion about appendcols? It's an interesting abuse of appendpipe However one should note that in the general case appendpipe [search ...] will give you very different results than append [search ...] and appendcols [search ...] as the latter two go back to indexes from disk to add results, but the former will only add rows and filter results from before the command.

Actually as I was writing this, I was trying to figure out how appendpipe allowed you to use inputlookup and it seems like not only does appendpipe allow generation commands, but also there's a very useful append=true parameter in inputlookup so | append [inputlookup ] or | appendpipe [inputlookup ] is more simply written as | inputlookup append=true

For completeness I'll note that appendcols has the same subsearch limitations as append (note the same sections in the docs about subsearch options). And I'll note that appendcols will throw an error if not used after a reporting command like stats ("Error in 'appendcols' command: You can only use appendcols after a reporting command (such as stats, chart, or timechart)."). But ignoring all that appendcols is still the wrong command because it'll only add results if the set being appended is longer than the original set and it does not care about any matching of fields. In this specific case, let's have an example, where your search prior to appendcols returns 3 rows (in order) host A, B, and C. Let's say that your lookup contains only 2 rows, (in order) host D and B. From these two sets of data, we would want the resulting set to be 2 rows A and C however after the appendcols command your result set will be exactly the same as before the appendcols command (since host is in both sets of data, the host field is taken from the source set by default, and since your lookup has fewer rows than your source set, no additional results are added). Your eval with coalesce will do nothing since all of your results already have the field type set on them (from before the appendcols ) and thus nothing will be excluded with your where afterwards.

woodcock · ‎10-23-2015

All good points, especially about appendcols which I never should have mentioned, as you noted. The mistake in my answer has been corrected (re-edited); it is now appendpipe. I am not at all convinced that your commands equivalency is entirely correct but I will retry and see and update the other Q&A.

acharlieh · ‎10-23-2015

As we discussed on the Splunk Usergroups Slack (signup request form) one option would be to use the subsearch retrieving the hosts in maintenance mode and inject that into a where clause... e.g.

| metadata ... | ... | where NOT [inputlookup ... | table host]

hlarimer · ‎10-23-2015

That works perfectly! Thanks for your quick response!

Compare search to lookup table and return results unique to search

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!