I have a large KV store lookup (approximately 1.5-2 million rows and 4 columns), and I need to create a search that adds 2 new columns into it from corresponding data.
Essentially the lookup is like this:
Server | Time | Variable1 | Variable2 |
and I need it to look like this:
Server | Time | Variable1 | Variable2 | Variable3 | Variable4 |
My current search is like this:
index=index sourcetype=sourcetype
| stats count by Server Time Variable3 Variable4
| fields- count
| lookup mylookup Server Time OUTPUT Variable1 Variable2
| outputlookup mylookup
The problem I'm running into is that the search gets caught on that lookup command for 2+ hours, and I'm not sure why it's taking so long to match that data.
Does anyone have any insight on why that is occurring or how I can restructure my search to accomplish my need in a more efficient way? Or would it be better to try updating the kv store via restAPI from the same script that is generating variable3 and variable4?
Does the mylookup definition contain Variable1 and Variable2 as fields?
You will have to edit the collections.conf to add those new fields.
Also, are you wanting to update the same Server and Time row in the lookup - if so you will need to also output the _key variable from the lookup as that needs to be used by the lookup to update the same row. In this case, you look like you are adding new rows to the lookup.
Do you have accelerated fields in your lookup - see https://dev.splunk.com/enterprise/docs/developapps/manageknowledge/kvstore/usingconfigurationfiles/#...
that will improve the lookup time.
Yes, the lookup definition contains all necessary fields, including the new ones.
I am wanting to update the same Server and Time rows as well as add new ones.
I'm using Splunk cloud, so I'm not sure there's a way to use accelerated fields?
I expect if you are using cloud you will have to put in a support ticket
If you want to UPDATE existing rows in the data, then you must OUTPUT _key as with the other fields so that existing events that are found in the lookup will have the _key field, so it can be used to update the row. For Servers that are NOT found in the lookup, the _key will be null. When you use outputlookup use append=t so that it will add new entries if _key is null or update the existing ones where _key already exists.