I have a KV store lookup with approximately 500k rows that is used on a regular basis in a number of searches we run daily to output enrichment data to our logs (email, phone number, location, etc.). When this lookup is used in conjunction with a | lookup command against a large search I can see in the inspector that the lookup command is taking almost 2 minutes at times to run through these large list searches. I've already tried using accelerated fields and minimizing the output fields, but saw little improvement. Is there anything else people use to improve search time performance on a KV store lookup when running against a large amount of data? On average the lookup for each large search is probably referencing between 20k-100k entries of the 500k total and outputting 5-7 fields.
@alexlombardi you can try out following two options:
Accelerated fields for the fields which are used for joining your SPLs to the KVStore (restrict fields to key input fields rather than enrichment output fields). Refer to answer: https://answers.splunk.com/answers/333540/for-large-lookups-should-i-use-the-kv-store.html
Indexer Replication for KV Store Knowledge Objects (this takes disk and network for replicating KVStore data as CSV bundle to indexer). This will allow map-reduce advantage in case only streaming commands are used prior to KVStore lookup. Refer to answer: https://answers.splunk.com/answers/390686/how-to-replicate-a-search-head-clusters-kv-store-l.html
Thanks for the suggestions. I am already replicating this KV store to our indexers, however I haven't seen much of a difference between using the local copies on the SHs versus using the indexer copies. As for the accelerated fields, I currently heave a setting for each field and I have not tried combining them as of yet. I'll need to test if there is any real difference in performance between the too.
@alexlombardi using index replication you cant take the advantage of map-reduce if lookup command is used after a transforming/non-streaming command and lookup is processed on SH instead of Indexers.
Refer to the following documentation: https://docs.splunk.com/Documentation/Splunk/latest/Search/Writebettersearches#Parallel_processing_e...
Also check out lookup command optimization: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_se...
Lookup command can be forced to run on SH but by default runs on Indexer provided all the commands prior to lookup are purely streaming. You would need to run performance tests to ensure position of lookup command as sometimes map-reduce can get you better results and other times transforming commands can reduce the result set to too low.
We have seen advantage of both the steps. Index replication of KV Store and accelerating only the required field usually it is only one or two per KV store like
unique ID field or
unique Name field. We store 3-5 million rows in the KV Stores.