There is a big difference in term of performance in using "inputlookup" and "lookup" from the following queries with the same time range. The number of records in "mylookup" object has 3k records.
query 1:
index=xyz src=* [| inputlookup mylookup | fields src]
query 2:
index=xyz src=*
| lookup mylookup src as src output key
| where isnotnull(key)
I got the result almost constantly from query 2, but query 1 was stuck. I think query 1 should have better performance than query 2 does. Any clues?
Thanks.
It's likely because of the subsearch present in one but not the other. For some reason that I'm unaware of, Splunk's performance quickly degrades when using subsearches. They should be avoided at all costs. In fact, if your lookup became > 10,000 rows, the subsearch wouldn't be accurate without increasing your maxout
parameter in the [subsearch]
stanza of limits.conf because the default maximum number of events to return from a subsearch is 10,000.
My lookup is relatively small, only contained between 3,000 and 3,400 records. According one of splunk best practices, "Filter as soon as possible", I think the fist query is better than the second one.
I don't know where the crossover for performance is, and I expect it changes depending on the deployment design. But staying in the tens is definitely safe.
Although "filter as soon as possible" is the general recommendation, the search inspector and introspection can help you choose the best command (inputlookup, lookup) for your data.
I believe that the server sends back a response that includes the entire expanded search string, which includes expanded inputlookup subsearches. In one case where a table was significantly increased in size, a search with a few results generated over 75MB of traffic from the server.
depends on how big the list. use pattern 2 as a rule if you don't know for SURE the list will be short like in the tens of values.
"mylookup" object has 3000 records, and less than 250 KB.