All Apps and Add-ons

Splunk DB Connect 2 - dbxquery only returns 1001 rows

Contributor

I'm trying to return all of the data from a table in a Microsoft SQL database using the sqljdbc4.jar. Running

 SELECT * from table

in SQL Management Studio correctly returns >11,000 records. However, when I try to run the same search from Splunk via dbxquery:

| dbxquery connection=landesk shortnames=t  maxrows=15,000 query="SELECT%20*%20from%20table"

Splunk returns exactly 1001 rows, regardless of any value of maxrows>1000. It seems like it's trying to page results, but there's no paging control in dbxquery... What am I missing?

Tags (1)
1 Solution

Contributor

so dbxquery.py has this in it's code:

from splunk import rest
from splunk import util

from functools import partial

REST_BASE='db_connect'
MAX_ROWS=1001

....

def generate(self):
        ....
        maxrows_opt = int(self.maxrows or 0)
        maxrows = MAX_ROWS
        ....
        if maxrows_opt and maxrows_opt < MAX_ROWS:
            maxrows = maxrows_opt

        ....

so it seems there's an intentional 1001 row hard limit in place. Anyone care to elaborate why? I was able to change this value to 100001 and pull in all of my values with no errors. I'm sure there are cases where we want to gate the rows imported (paging or based on an incrementing column), but need more flexibility to work directly with the data.

There may be a different way of thinking about this problem, but my goal was simply to directly access and extract the data into a state table (csv lookup) for Enterprise Security. The built in data inputs / data lookups didn't seem exactly to fit the bill.

View solution in original post

Splunk Employee
Splunk Employee

Greetings LANDesk user! DBX 2.3.0 should no longer have these limitations, please give it a shot.

0 Karma

Explorer

DBX still has a default maxrows = 100,000. If you want to overcome that limitation and return 100,001 rows, use the parameter 'maxrows' like this: '| dbxquery maxrows=100001'

Engager

Agreed - forcing the use of DB Input has two side effects:

1) Ingesting the entire dataset into a Splunk index from the source DB - data duplication, synchronization issues, ETL is bad when datasets are large (think Phoenix on HBase!)
2) DB Input counts towards ones Splunk quotas - Sorry that's double dipping...

Contributor

so dbxquery.py has this in it's code:

from splunk import rest
from splunk import util

from functools import partial

REST_BASE='db_connect'
MAX_ROWS=1001

....

def generate(self):
        ....
        maxrows_opt = int(self.maxrows or 0)
        maxrows = MAX_ROWS
        ....
        if maxrows_opt and maxrows_opt < MAX_ROWS:
            maxrows = maxrows_opt

        ....

so it seems there's an intentional 1001 row hard limit in place. Anyone care to elaborate why? I was able to change this value to 100001 and pull in all of my values with no errors. I'm sure there are cases where we want to gate the rows imported (paging or based on an incrementing column), but need more flexibility to work directly with the data.

There may be a different way of thinking about this problem, but my goal was simply to directly access and extract the data into a state table (csv lookup) for Enterprise Security. The built in data inputs / data lookups didn't seem exactly to fit the bill.

View solution in original post

Communicator

It seems like the latest version upped this limit to 5001, but the problem bit me too, and I was going crazy trying to figure out why my query was behaving differently between Splunk and the mysql client. I also raised the limit inside the code to the maximum allowed value for maxSetRows, which seems to be 50,000,000 for this particular driver.

0 Karma

Engager

Agreed. We ended up changing the hardcoded 1001 to 1,000,000 to get around this issue, but will have to remember to fix it every time we upgrade. Hard limiting the rows to 1001 makes the app completely pointless for most practical use. It's more than just a "preview" tool. This hard limit affects the dbxquery command also.

Splunk Employee
Splunk Employee

dbxquery is intended for use as a preview tool, and isn't guaranteed to be around forever. If you want to get data from a relational database, I'd recommend using a dbinput (seems like batch mode would fit the bill) or dblookup.

0 Karma

Contributor

Yeah, I'm not sold on that...

a: not how it's documented
b: dbindex and dblookup both seem pointless for my use case- while relatively insignificant using dbindex would count against index licensing limits when all I need is a current state of the data. dblookup requires mapping against a Splunk search... again all I want to do is extract data from the SQL tables to build a current-state table.

Honestly the only reason I looked at v2 of DB Connect is the support of Search Head clusters. If dbxquery isn't fully supported as an interface to extract data (despite the documentation) then I'll script another solution and uninstall...

Engager

I'm going to have to agree with Jeff here. We use dbquery all throughout out Splunk installation to show real-time data from our database. We have no interest in indexing this data as we only care about what it looks like at this moment in time (for example, we use it heavily to show data on which help desk tickets currently need to be worked and then launch our help desk system when they click on one of those rows).

Contributor

Agree - not sure what the point of maxrows is, if there is a hard coded limit of 1000. The DBX GUI based lookup and inputs do not allow for advanced SQL queries. Sometimes you just want to rip a SQL query, run it once per day and do an outputlookup or something. There shouldnt be a hard coded limit like this.

0 Karma

Contributor

Actually - you can pass some advanced SQL to get what you need. You just have to index the data first, then you could do an outputlookup from there.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!