I am planning to setup a query within splunk where I need to collect 3 million records from the database and correlate the data with the CSV Lookup which I have in splunk. I am not looking for indexing the database data in splunk but only use it for correlation. Which would be the better way of using?
| dbxquery ... | lookup csvFileName.csv ... | outputlookup xyz.csv
| lookup csvFileName.csv | dbxlookup connection= query= ..... | outputlookup xyz.csv
Will there be any performance issues on the database or on the splunk server for running this queries for getting million records?
Hi @arrangineni ,
Did you get it working, how did the testing go. Even I am planning to use the dbxlookup
for a huge correlation.
We could work together
Thanks
Nawaz
Hi,
you should read the first part of this page to make sure you're selecting the right design: http://docs.splunk.com/Documentation/DBX/3.1.3/DeployDBX/Createandmanagedatabaselookups
With such a large lookup, you're going to end up with an index anyway, either an automatic summary or a KV store... which effectively means "copy from one database into another database".
Thanks for your reply. May I now if there will be any maximum number of rows limit from the database while I use the second scenario using Splunk DB connect Lookups. Wil that impact any system performance from Splunk or Database getting 3 Million records at a single run.
there's always a maximum somewhere... 3 million is large enough to produce bundle replication challenges for instance. DBX doesn't have a built-in limit that I'm aware of, but you could certainly produce a query that's bigger than the database wants to return.