Solved: DB Connect Set Beginning rising_column and Limitin...

tpederson · ‎02-26-2014

I want to start tailing a large table in MS SQL. I have done other tails with DB Connect, but they were small enough that the first run could pull the whole table and not create a problem. However, the one I'm trying to tail is very large, and contains older rows that I don't need. In order to do this, I think I need to do the following:

1) Start the rising_column number at an id from a few months ago

2) Limit the first run and subsequent runs to something like 10,000 rows

Is this possible using DB Connect?

aelliott · ‎02-27-2014

Here's how I do this.. I just put at the end of the query Where myColumn > 234234

you will need to put And inside the brackets as below.

Do not put the "Where myColumn > 234234" inside the {{ and rising_column...}} .. leave outside.. and if it's your rising column, then it will always be bigger than the myColumn value so it doesn't hurt to leave it in there.

So it would be Where myColumn > 234234 {{ and rising_column > ?}}

Also you can always put "select TOP 10000" on your query and order by your ID column ascending

make sense?

View solution in original post

aelliott · ‎02-27-2014

Here's how I do this.. I just put at the end of the query Where myColumn > 234234

you will need to put And inside the brackets as below.

Do not put the "Where myColumn > 234234" inside the {{ and rising_column...}} .. leave outside.. and if it's your rising column, then it will always be bigger than the myColumn value so it doesn't hurt to leave it in there.

So it would be Where myColumn > 234234 {{ and rising_column > ?}}

Also you can always put "select TOP 10000" on your query and order by your ID column ascending

make sense?

tpederson · ‎02-27-2014

That makes sense, thanks for the answer. However, I had a pressing commitment to get this data indexed. So, I brewed up a Python script to do the same thing. I would like to get DB Connect working in the future, as it seams easier.

lukejadamec · ‎02-26-2014

Yes, you can do 1, but I don't see much point to doing 2.
Instead of rising_column > ? change the ? to an epoch timestamp of the datetime of your choosing.

Alternatively, you could use the ? to initialize the database input, and then immediately stop splunk and update the state.xml to your choosing - see this post:

http://answers.splunk.com/answers/91112/dbconnect-value-of-risingcolumn

tpederson · ‎02-27-2014

I want to space out the draws. If there are tens of millions of rows from the start date til now, having Splunk pull all of them at once is going to cause trouble for my app.

lukejadamec · ‎02-26-2014

Yea, but that just delays things. If you say to the indexer only get 10000 rows, you're saying to the indexer "only get 10000 at a time". It will still collect all rows from the rising column value.
I have seen Splunk appear to stop indexing when I index an new db with a dozen GB of backlog, but it wakes up on its own after a short time.

tpederson · ‎02-26-2014

The point of number 2 is to keep DB Connect from overtaxing my DB server and indexing more data than our license permits. Even if I set the beginning id, pulling the last three months worth of rows in one shot would slow our production DB server to a crawl.

DB Connect Set Beginning rising_column and Limiting First Run

Thanks for the Memories! Splunk University, .conf25, and our Community

Data Persistence in the OpenTelemetry Collector

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Are you a member of the Splunk Community?

DB Connect Set Beginning rising_column and Limiting First Run

Thanks for the Memories! Splunk University, .conf25, and our Community

Data Persistence in the OpenTelemetry Collector

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever