Hi all,
I'm looking to create a lookup table and wondering what is my best practice.
Data source is SQL and I already have the dbxlookup that imports data from SQL to Splunk
I would like to build a lookup table to be refreshed from SQL twice a day
I would like my search to import the data from the lookup and not from SQL
The table contains today 1,000 rows and I do not expect it to pass the 10,000 rows
I will appreciate if someone can guide me to the best approach.
Thanks in advance,
Since you want to refresh your data, and want to ensure it doesn't get emptied in case your db query fails, you can use your lookup generation query like this
| dbxquery .... | inputlookup yourLookup.csv append=t | dedup ...columns that uniquely identify a lookup row... | outputlookup yourLookup.csv
So it'll refresh your lookup table with new data coming from dbxquery, retaining the old values. It'll give a warming when you run it for the first time as yourLookup.csv will not exist before this queries first run.
Since you want to refresh your data, and want to ensure it doesn't get emptied in case your db query fails, you can use your lookup generation query like this
| dbxquery .... | inputlookup yourLookup.csv append=t | dedup ...columns that uniquely identify a lookup row... | outputlookup yourLookup.csv
So it'll refresh your lookup table with new data coming from dbxquery, retaining the old values. It'll give a warming when you run it for the first time as yourLookup.csv will not exist before this queries first run.
Is there a penalty for the order in which you query your input data?
I tend to start appending in/out searches with
|inputlookup....|my new search...(dbx in this case) | outputlookup ....
I have no technical reason for doing it in this order (other than it sort-of logically feels right) but having read your (better) suggestion than mine, I now wonder about the performance?
The dedup command keeps the latest data, i.e. data appears first in the results set, which should be from the source, so the main search comes first. Another reason we run it as base search is that it's may take a significant time to execute, depending upon the query, and we don't want the subsearch limits to apply to it (auto-finalized due to longer run or space limit).
Take a look at outputlookup
...your_search...|table cola, colb colc, cold|outputlookup my-lookup.csv
Add that to a scheduled search, and it will create you a new lookup file each time it runs.
thanks.. works great!!!