I have a search that tells me when a system doesn't report into splunk after a threshold of an hour:
|metadata index=vmware type=hosts | eval timenow=now() | eval lastseen=timenow-recentTime | where lastseen > 3600 | eval last_seen=tostring(lastseen, "duration") | table host last_seen
Then I have a lookup csv that is driven by a dashboard that allows users to enter systems that are in maintenance mode or decommissioned. If a system hits the threshold from the first search but is listed in the csv I don't want it to fire the alert. The lookup table is called decom_maint_systems.csv.
I can get the correct hostname by using set diff but the problem is I lose the "last_seen" field which I really want to keep. So how do I compare the two while keeping the fields I want from the initial search?
As we discussed on the Splunk Usergroups Slack (signup request form) one option would be to use the subsearch retrieving the hosts in maintenance mode and inject that into a where clause... e.g.
| metadata ... | ... | where NOT [inputlookup ... | table host]
Like this:
|metadata index=vmware type=hosts | eval timenow=now() | eval lastseen=timenow-recentTime | where lastseen > 3600 | eval last_seen=tostring(lastseen, "duration") | eval type="search" | table host last_seen type| appendpipe [|inputlookup decom_main_systems.csv] | eval type=coalesce(type, "inputlookup") | stats dc(type) AS numTypes* AS * by host
This gives the fully merged set (full join). For each function, tack on the appropriate remaining search string:
For your case:
| where numTypes=1 AND type="search"
For XOR (outer join):
| where numTypes= 1
For left join:
| where type="search"
For right join:
| where type="inputlookup"
For inner join:
| where numTypes>1
Be careful... I think you want append instead of appendcols as the former adds new events to the result set the latter just tacks on fields to existing events (without regard to fields matching, it's just first results are combined, second results, etc) For example, check out: |noop | stats count | fields | eval foo=mvrange(1,10,1) | mvexpand foo | appendcols [noop | stats count | fields | eval bar=mvrange(1,10,1) | mvexpand bar |reverse]
Your setting of type for the inputlookup should then be a simple eval inside the subsearch of append, and you need to use a statistical function for all other fields... (e.g. stats values(*) as * by host
) But then you can pipe to a where command as you mention.
Until recently, I would have given the same advice but using append
incurs severe number-of-event limits which can be avoided by using appendcols
instead. See this interesting Q&A for details:
https://answers.splunk.com/answers/318428/how-can-i-append-and-escape-the-50k-subsearch-limi.html
Given this, I stand by my answer as-is.
Did you know that in your link you're using appendpipe
in all your searches, but drawing a conclusion about appendcols
? It's an interesting abuse of appendpipe
However one should note that in the general case appendpipe [search ...]
will give you very different results than append [search ...]
and appendcols [search ...]
as the latter two go back to indexes from disk to add results, but the former will only add rows and filter results from before the command.
Actually as I was writing this, I was trying to figure out how appendpipe allowed you to use inputlookup and it seems like not only does appendpipe allow generation commands, but also there's a very useful append=true
parameter in inputlookup so | append [inputlookup ]
or | appendpipe [inputlookup ]
is more simply written as | inputlookup append=true
For completeness I'll note that appendcols
has the same subsearch limitations as append
(note the same sections in the docs about subsearch options). And I'll note that appendcols
will throw an error if not used after a reporting command like stats ("Error in 'appendcols' command: You can only use appendcols after a reporting command (such as stats, chart, or timechart)."). But ignoring all that appendcols
is still the wrong command because it'll only add results if the set being appended is longer than the original set and it does not care about any matching of fields. In this specific case, let's have an example, where your search prior to appendcols returns 3 rows (in order) host A, B, and C. Let's say that your lookup contains only 2 rows, (in order) host D and B. From these two sets of data, we would want the resulting set to be 2 rows A and C however after the appendcols command your result set will be exactly the same as before the appendcols command (since host is in both sets of data, the host field is taken from the source set by default, and since your lookup has fewer rows than your source set, no additional results are added). Your eval with coalesce will do nothing since all of your results already have the field type set on them (from before the appendcols ) and thus nothing will be excluded with your where afterwards.
All good points, especially about appendcols
which I never should have mentioned, as you noted. The mistake in my answer has been corrected (re-edited); it is now appendpipe
. I am not at all convinced that your commands equivalency is entirely correct but I will retry and see and update the other Q&A.
As we discussed on the Splunk Usergroups Slack (signup request form) one option would be to use the subsearch retrieving the hosts in maintenance mode and inject that into a where clause... e.g.
| metadata ... | ... | where NOT [inputlookup ... | table host]
That works perfectly! Thanks for your quick response!