Solved: Splunk DB Connect 2 - dbxquery only returns 1001 r...

jeff · ‎05-07-2015

I'm trying to return all of the data from a table in a Microsoft SQL database using the sqljdbc4.jar. Running

 SELECT * from table

in SQL Management Studio correctly returns >11,000 records. However, when I try to run the same search from Splunk via dbxquery:

| dbxquery connection=landesk shortnames=t  maxrows=15,000 query="SELECT%20*%20from%20table"

Splunk returns exactly 1001 rows, regardless of any value of maxrows>1000. It seems like it's trying to page results, but there's no paging control in dbxquery... What am I missing?

jeff · ‎05-08-2015

so dbxquery.py has this in it's code:

from splunk import rest
from splunk import util

from functools import partial

REST_BASE='db_connect'
MAX_ROWS=1001

....

def generate(self):
        ....
        maxrows_opt = int(self.maxrows or 0)
        maxrows = MAX_ROWS
        ....
        if maxrows_opt and maxrows_opt &lt; MAX_ROWS:
            maxrows = maxrows_opt

        ....

so it seems there's an intentional 1001 row hard limit in place. Anyone care to elaborate why? I was able to change this value to 100001 and pull in all of my values with no errors. I'm sure there are cases where we want to gate the rows imported (paging or based on an incrementing column), but need more flexibility to work directly with the data.

There may be a different way of thinking about this problem, but my goal was simply to directly access and extract the data into a state table (csv lookup) for Enterprise Security. The built in data inputs / data lookups didn't seem exactly to fit the bill.

View solution in original post

jcoates_splunk · ‎08-02-2016

Greetings LANDesk user! DBX 2.3.0 should no longer have these limitations, please give it a shot.

benton · ‎08-25-2016

DBX still has a default maxrows = 100,000. If you want to overcome that limitation and return 100,001 rows, use the parameter 'maxrows' like this: '| dbxquery maxrows=100001'

alex_loffler · ‎09-25-2015

Agreed - forcing the use of DB Input has two side effects:

1) Ingesting the entire dataset into a Splunk index from the source DB - data duplication, synchronization issues, ETL is bad when datasets are large (think Phoenix on HBase!)
2) DB Input counts towards ones Splunk quotas - Sorry that's double dipping...

jeff · ‎05-08-2015

so dbxquery.py has this in it's code:

from splunk import rest
from splunk import util

from functools import partial

REST_BASE='db_connect'
MAX_ROWS=1001

....

def generate(self):
        ....
        maxrows_opt = int(self.maxrows or 0)
        maxrows = MAX_ROWS
        ....
        if maxrows_opt and maxrows_opt &lt; MAX_ROWS:
            maxrows = maxrows_opt

        ....

so it seems there's an intentional 1001 row hard limit in place. Anyone care to elaborate why? I was able to change this value to 100001 and pull in all of my values with no errors. I'm sure there are cases where we want to gate the rows imported (paging or based on an incrementing column), but need more flexibility to work directly with the data.

There may be a different way of thinking about this problem, but my goal was simply to directly access and extract the data into a state table (csv lookup) for Enterprise Security. The built in data inputs / data lookups didn't seem exactly to fit the bill.

delink · ‎07-30-2016

It seems like the latest version upped this limit to 5001, but the problem bit me too, and I was going crazy trying to figure out why my query was behaving differently between Splunk and the mysql client. I also raised the limit inside the code to the maximum allowed value for maxSetRows, which seems to be 50,000,000 for this particular driver.

clayramey · ‎06-02-2015

Agreed. We ended up changing the hardcoded 1001 to 1,000,000 to get around this issue, but will have to remember to fix it every time we upgrade. Hard limiting the rows to 1001 makes the app completely pointless for most practical use. It's more than just a "preview" tool. This hard limit affects the dbxquery command also.

araitz · ‎05-12-2015

dbxquery is intended for use as a preview tool, and isn't guaranteed to be around forever. If you want to get data from a relational database, I'd recommend using a dbinput (seems like batch mode would fit the bill) or dblookup.

jeff · ‎05-12-2015

Yeah, I'm not sold on that...

a: not how it's documented
b: dbindex and dblookup both seem pointless for my use case- while relatively insignificant using dbindex would count against index licensing limits when all I need is a current state of the data. dblookup requires mapping against a Splunk search... again all I want to do is extract data from the SQL tables to build a current-state table.

Honestly the only reason I looked at v2 of DB Connect is the support of Search Head clusters. If dbxquery isn't fully supported as an interface to extract data (despite the documentation) then I'll script another solution and uninstall...

monkeydust · ‎05-14-2015

I'm going to have to agree with Jeff here. We use dbquery all throughout out Splunk installation to show real-time data from our database. We have no interest in indexing this data as we only care about what it looks like at this moment in time (for example, we use it heavily to show data on which help desk tickets currently need to be worked and then launch our help desk system when they click on one of those rows).

pj · ‎05-19-2015

Agree - not sure what the point of maxrows is, if there is a hard coded limit of 1000. The DBX GUI based lookup and inputs do not allow for advanced SQL queries. Sometimes you just want to rip a SQL query, run it once per day and do an outputlookup or something. There shouldnt be a hard coded limit like this.

pj · ‎05-19-2015

Actually - you can pass some advanced SQL to get what you need. You just have to index the data first, then you could do an outputlookup from there.

Splunk DB Connect 2 - dbxquery only returns 1001 rows

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Splunk DB Connect 2 - dbxquery only returns 1001 rows

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...