Splunk Search
Highlighted

How to avoid subsearch auto-finalize in query performing outer join against lookup table

Path Finder

I've got an inventory list, which greatly simplified looks like below and made it available to splunk as a lookup table.

host,os_type
m00001,linux
m00002,linux
m00003,linux

What I want to do is list the number of records against the inventory, including where the count is 0.
The query below uses an outer join and works but for anything longer than a few minutes I get

[subsearch]: Search auto-finalized after time limit (60 seconds) reached.

To be of value the count, i.e. inner query, would need to run for relatively long periods, say 1 day.

| inputlookup server_list
| fields host  
| join type=outer host [ search index=some_index | stats count by host ]
| fillnull value=0 count 

I pondered using metadata but both metadata and inputlookup need to be the first command so that seems to be a non-starter.

I can use the REST API and get the desired result by effectively doing the outer join outside Splunk but I wonder what other options exist.
I am probably not able to increase the subquery auto-finalize limit.

0 Karma
Highlighted

Re: How to avoid subsearch auto-finalize in query performing outer join against lookup table

SplunkTrust
SplunkTrust

I have the similar setup (server name inventory) and below query works for me just fine using metadata. (you need to add a '|' before metadata to make is first command)

|inputlookup serverInventory.csv | fields serverName | rename serverName as host | join type=outer host [|metadata type=hosts index=some_index | table host, totalCount]
Highlighted

Re: How to avoid subsearch auto-finalize in query performing outer join against lookup table

Path Finder

Ah, didn't think of trying metadata or inputlookup as first element of a sub-query. This certainly opens up possibilities!

0 Karma
Highlighted

Re: How to avoid subsearch auto-finalize in query performing outer join against lookup table

Splunk Employee
Splunk Employee

There are a couple of ways. First, another answer has suggested using the metadata command, which is fine as long are you're just counting by host. If you're needing a more specific query, a count by host+source, or something else, that won't help you (though in version 6.0 you can doing things like | tstats count WHERE source=xyz GROUPBY host,source very quickly using any other indexed field, or you can similarly use an accelerated data model for more complex queries).

But I would say that you can just reverse the order in general:

index=x somekeywords | stats count by host | join host [ inputlookup server_list ]

This misses out any empty hosts, so you won't have zeros, though there are workarounds to this, like:

index=x somekeywords | append [ inputlookup server_list ] | stats count by host | eval count=count-1 | join host [ inputlookup server_list ]

This is really what you want, not to run the main search in a subsearch.

But really, the better answer for the functionality you want is simply a lookup, which you can configure as an auto lookup (see props.conf and transforms.conf) or inline:

index=x somekeywords | stats count by host | lookup server_list host

or

index=x somekeywords | append [ inputlookup server_list ] | stats count by host | eval count=count-1 | lookup server_list host

View solution in original post