I have a KV store lookup with approximately 500k rows that is used on a regular basis in a number of searches we run daily to output enrichment data to our logs (email, phone number, location, etc.). When this lookup is used in conjunction with a | lookup command against a large search I can see in the inspector that the lookup command is taking almost 2 minutes at times to run through these large list searches. I've already tried using accelerated fields and minimizing the output fields, but saw little improvement. Is there anything else people use to improve search time performance on a KV store lookup when running against a large amount of data? On average the lookup for each large search is probably referencing between 20k-100k entries of the 500k total and outputting 5-7 fields.
I have 2 kv_store of each around 200,000 records and each kv_store of around 50MB size
| inputlookup kv_store1
| lookup kv_store2 common_field output req_field1
This query takes around 500 to 600 seconds to run.
any suggestions to get the result in around 10 seconds?
Please suggest
@alexlombardi you can try out following two options:
Accelerated fields
for the fields which are used for joining your SPLs to the KVStore (restrict fields to key input fields rather than enrichment output fields). Refer to answer: https://answers.splunk.com/answers/333540/for-large-lookups-should-i-use-the-kv-store.html
Indexer Replication
for KV Store Knowledge Objects (this takes disk and network for replicating KVStore data as CSV bundle to indexer). This will allow map-reduce advantage in case only streaming commands are used prior to KVStore lookup. Refer to answer: https://answers.splunk.com/answers/390686/how-to-replicate-a-search-head-clusters-kv-store-l.html
@niketnilay
Thanks for the suggestions. I am already replicating this KV store to our indexers, however I haven't seen much of a difference between using the local copies on the SHs versus using the indexer copies. As for the accelerated fields, I currently heave a setting for each field and I have not tried combining them as of yet. I'll need to test if there is any real difference in performance between the too.
@alexlombardi using index replication you cant take the advantage of map-reduce if lookup command is used after a transforming/non-streaming command and lookup is processed on SH instead of Indexers.
Refer to the following documentation: https://docs.splunk.com/Documentation/Splunk/latest/Search/Writebettersearches#Parallel_processing_e...
https://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Parallelreduceoverview
Also check out lookup command optimization: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_se...
Lookup command can be forced to run on SH but by default runs on Indexer provided all the commands prior to lookup are purely streaming. You would need to run performance tests to ensure position of lookup command as sometimes map-reduce can get you better results and other times transforming commands can reduce the result set to too low.
We have seen advantage of both the steps. Index replication of KV Store and accelerating only the required field usually it is only one or two per KV store like unique ID field
or unique Name field
. We store 3-5 million rows in the KV Stores.