Solved: About using "rising column" in DBconnect v3.

yutaka1005 · ‎11-23-2017

I'm thinking to get data from MSSQL server by using DBconnect.
Then I want get only new data by using "rising column".

By the way, there is one point which concern me.

When acquiring new data using "rising column", and using time columns, if data has the same time value with other data is added, can Splunk capture it?

Example:
Time Name
2017/11/24 10:00:00 John ← Already indexed
2017/11/24 10:15:00 Mark ← Already indexed
2017/11/24 10:15:00 Bob ← New!

Also, if it is unenable, is there any good method?

If someone knows about it, I would appreciate it if you could tell me.

jplumsdaine22 · ‎11-24-2017

I don't speak for the developers, but from conversations I've had with some of them I know that using time for the rising column drives them crazy. Consider: normally your query would look something like this: SELECT * FROM FOO WHERE date > ? order by date ASC ;

The value substituted for ? comes directly from the file $SPLUNK_HOME/var/lib/splunk/modinputs/server/splunk_app_db_connect/$connection_name .

If you're using time for a rising column in that file you will see lines like this:

{"value":"2017-11-09 04:46:32","appVersion":"3.1.1","columnType":93,"timestamp":"2017-11-17T04:46:34.357Z"}

The value field is what gets substitued for ?. (The timestamp field is the time the checkpoint was written) So in your example above, after the first run of the input you might see this in the checkpoint file

 {"value":"2017/11/24 10:15:00","appVersion":"3.1.1","columnType":93,"timestamp":"2017-11-17T04:46:34.357Z"}

The next time the query runs, what gets sent to the database is this

 SELECT * FROM FOO WHERE date > '2017/11/24 10:15:00' order by date ASC ;

So you are correct, the entry 2017/11/24 10:15:00 Bob will be ignored. This is a feature rather than a bug. The only way to guarantee you are getting the correct data is to use a unique key for the rising column, and a timestamp will rarely be a unique key constraint on a database table. Your database admins might be able to point you to the actual primary key of the table you are interested in - that is the only way to guarantee uniqueness. Unfortunately, the primary key is often simply an auto incrementing row number, which, if the table gets pruned/purged over time, will lead to you missing or duplicating rows. If you must use timestamps as a rising column, try to use one with mili or microsecond precision - that way you are very unlikely to have duplicate entries.

If you use timestamps with only second precision then to avoid losing data try changing your query to use the >= operator. In your example that will mean indexing 2017/11/24 10:15:00 Mark twice, but you won't miss 2017/11/24 10:15:00 Bob

Alternatively, if the database can take it have the DBA's create an index or view that does have unique keys for you to use.

View solution in original post

jplumsdaine22 · ‎11-24-2017

I don't speak for the developers, but from conversations I've had with some of them I know that using time for the rising column drives them crazy. Consider: normally your query would look something like this: SELECT * FROM FOO WHERE date > ? order by date ASC ;

The value substituted for ? comes directly from the file $SPLUNK_HOME/var/lib/splunk/modinputs/server/splunk_app_db_connect/$connection_name .

If you're using time for a rising column in that file you will see lines like this:

{"value":"2017-11-09 04:46:32","appVersion":"3.1.1","columnType":93,"timestamp":"2017-11-17T04:46:34.357Z"}

The value field is what gets substitued for ?. (The timestamp field is the time the checkpoint was written) So in your example above, after the first run of the input you might see this in the checkpoint file

 {"value":"2017/11/24 10:15:00","appVersion":"3.1.1","columnType":93,"timestamp":"2017-11-17T04:46:34.357Z"}

The next time the query runs, what gets sent to the database is this

 SELECT * FROM FOO WHERE date > '2017/11/24 10:15:00' order by date ASC ;

So you are correct, the entry 2017/11/24 10:15:00 Bob will be ignored. This is a feature rather than a bug. The only way to guarantee you are getting the correct data is to use a unique key for the rising column, and a timestamp will rarely be a unique key constraint on a database table. Your database admins might be able to point you to the actual primary key of the table you are interested in - that is the only way to guarantee uniqueness. Unfortunately, the primary key is often simply an auto incrementing row number, which, if the table gets pruned/purged over time, will lead to you missing or duplicating rows. If you must use timestamps as a rising column, try to use one with mili or microsecond precision - that way you are very unlikely to have duplicate entries.

If you use timestamps with only second precision then to avoid losing data try changing your query to use the >= operator. In your example that will mean indexing 2017/11/24 10:15:00 Mark twice, but you won't miss 2017/11/24 10:15:00 Bob

Alternatively, if the database can take it have the DBA's create an index or view that does have unique keys for you to use.

yutaka1005 · ‎11-26-2017

Hi jplumsdaine22,

Thank you for answer.

Thanks to your very polite explanation, I fully understood.
As described in your answer, if I have to use datetime format column as "rising column", I will use the timestamp has mili or microsecond precision or ">=".

Finally please let me ask one more question.

If I have unique sequence number column in data table, but it will recycle by max number, then how do you set up?

jplumsdaine22 · ‎11-27-2017

Good question! My advise to you is to join the splunk slack (see this question for details https://answers.splunk.com/answers/443734/is-there-a-splunk-slack-channel.html) and find the experts on the #dbconnect channel. They'll be able to help you out

gjanders · ‎11-27-2017

If you do find a useful answer please post it, as I'm also curious how this should be done!

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

harsmarvania57 · ‎11-24-2017

You need to configure rising column which has unique value, we ran into same problem For example 1 row inserted in DB table at 10:15:00 (For example at 100ms) and splunk fetch data at 10:15:00 (For example at 200 ms) and then new row inserted in DB at 10:15:00 (For example at 500 ms) then splunk set checkpoint value as 10:15:00 because you don't have millisecond in your datetime rising column so your event with Data "Bob" will be missed on next DB Input schedule run.

So please set rising column as column which has unique value and does not repeat those value in column.

gjanders · ‎11-23-2017

From what I understand the checkpoint system of DB connect records the largest value of the column that it saw.

So if the query ran at 10:05AM it would capture the raw with "John".
If it runs after 10:15:00 then it should capture the other two rows

If the row for Bob is added after 10:15:00 with the timestamp of 10:15:00 and the script in DB connect had run and captured the row with John / 10:15 in it, then you might miss a row, however if the row with Bob was added afterward I would expect a different timestamp...

I wouldn't mind official comments from the DB connect team or someone else to confirm this as well...

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

About using "rising column" in DBconnect v3.

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Network to App: Observability Unlocked [May & June Series]

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation