Solved: creating scheduled lookup table

eranday · ‎01-15-2018

Hi all,

I'm looking to create a lookup table and wondering what is my best practice.

Data source is SQL and I already have the dbxlookup that imports data from SQL to Splunk
I would like to build a lookup table to be refreshed from SQL twice a day
I would like my search to import the data from the lookup and not from SQL
The table contains today 1,000 rows and I do not expect it to pass the 10,000 rows

I will appreciate if someone can guide me to the best approach.
Thanks in advance,

somesoni2 · ‎01-15-2018

Since you want to refresh your data, and want to ensure it doesn't get emptied in case your db query fails, you can use your lookup generation query like this

| dbxquery .... | inputlookup yourLookup.csv append=t | dedup ...columns that uniquely identify a lookup row... | outputlookup yourLookup.csv

So it'll refresh your lookup table with new data coming from dbxquery, retaining the old values. It'll give a warming when you run it for the first time as yourLookup.csv will not exist before this queries first run.

View solution in original post

somesoni2 · ‎01-15-2018

Since you want to refresh your data, and want to ensure it doesn't get emptied in case your db query fails, you can use your lookup generation query like this

| dbxquery .... | inputlookup yourLookup.csv append=t | dedup ...columns that uniquely identify a lookup row... | outputlookup yourLookup.csv

So it'll refresh your lookup table with new data coming from dbxquery, retaining the old values. It'll give a warming when you run it for the first time as yourLookup.csv will not exist before this queries first run.

nickhills · ‎01-15-2018

Is there a penalty for the order in which you query your input data?
I tend to start appending in/out searches with
|inputlookup....|my new search...(dbx in this case) | outputlookup ....
I have no technical reason for doing it in this order (other than it sort-of logically feels right) but having read your (better) suggestion than mine, I now wonder about the performance?

If my comment helps, please give it a thumbs up!

somesoni2 · ‎01-15-2018

The dedup command keeps the latest data, i.e. data appears first in the results set, which should be from the source, so the main search comes first. Another reason we run it as base search is that it's may take a significant time to execute, depending upon the query, and we don't want the subsearch limits to apply to it (auto-finalized due to longer run or space limit).

nickhills · ‎01-15-2018

Take a look at outputlookup

...your_search...|table cola, colb colc, cold|outputlookup my-lookup.csv

Add that to a scheduled search, and it will create you a new lookup file each time it runs.

If my comment helps, please give it a thumbs up!

eranday · ‎01-15-2018

thanks.. works great!!!

creating scheduled lookup table

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

creating scheduled lookup table

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases