Solved: Re: Why is using a timestamp column for the rising...

jcoates_splunk · ‎05-05-2016

There's a lot of database tables where the date and time stamp is the only thing that looks anything like a rising column. That can work and it is supported, but it's not the best scenario. Here's why.

jcoates_splunk · ‎05-05-2016

Q: Why is using a timestamp column for rising column suboptimal?
A: Rising columns are used to select subsets of data, using the construction SELECT THINGS FROM TABLE WHERE RISING_COLUMN > LAST_OBSERVED_VALUE. Doing this makes the following assumptions:

There is never more than one row created in a given time. If you're setting the time to a one second level of resolution and getting five records per second, you're going to lose four records from every run.
All of the times in the table are in a single format from a single timezone, in which daylight savings and clock corrections never occur.

Q: What if we switch the rising column logic from greater than (>) to greater than or equal to (>=)?
A: That will prevent data loss, but it will produce duplications instead.

Q: What would be better to do instead?
A: Ideally the database will provide a row id for the table in question. This makes the database responsible for duplication avoidance, and all modern databases have good support for ensuring that the data will be correct. This also takes timezones and time formatting out of the question. If there isn't a row id, using a view to produce one is a powerful option.

Q: But I don't have a choice!
A: Go ahead and do what you gotta do, don't let perfect be the enemy of good... but it's best to be aware of the drawbacks instead of being surprised by them.

View solution in original post

jcoates_splunk · ‎05-05-2016

Q: Why is using a timestamp column for rising column suboptimal?
A: Rising columns are used to select subsets of data, using the construction SELECT THINGS FROM TABLE WHERE RISING_COLUMN > LAST_OBSERVED_VALUE. Doing this makes the following assumptions:

There is never more than one row created in a given time. If you're setting the time to a one second level of resolution and getting five records per second, you're going to lose four records from every run.
All of the times in the table are in a single format from a single timezone, in which daylight savings and clock corrections never occur.

Q: What if we switch the rising column logic from greater than (>) to greater than or equal to (>=)?
A: That will prevent data loss, but it will produce duplications instead.

Q: What would be better to do instead?
A: Ideally the database will provide a row id for the table in question. This makes the database responsible for duplication avoidance, and all modern databases have good support for ensuring that the data will be correct. This also takes timezones and time formatting out of the question. If there isn't a row id, using a view to produce one is a powerful option.

Q: But I don't have a choice!
A: Go ahead and do what you gotta do, don't let perfect be the enemy of good... but it's best to be aware of the drawbacks instead of being surprised by them.

jcoates_splunk · ‎08-22-2016

Adding another failure scenario that has been coming up recently... the other assumption of doing this is that you're always accurately parsing your string into a datetime format. Say you set rising column to a VARCHAR datetime... in English, the lexical order of month names is not the same as the chronological order, so your input stops every month.

jcoates_splunk · ‎08-25-2016

And, because every conversation on this topic ends up going here, let's link this: http://docs.splunk.com/Documentation/DBX/2.3.0/DeployDBX/SQLtipsandtricks#Use_CAST_or_CONVERT_to_imp...

BenTan · ‎10-06-2016

Hi Jack,

This is none related to the question itself, but I am facing an issue with the sp13audit.exe that is being packaged inside Splunk Add-On for Microsoft SharePoint and there seems to be no fixes online. Would you mind sharing the source code of the exe file if it's not too much of a trouble? We are really trying to get the app up and running.

Really appreciate if you could lend a helping hand.

Thanks!

And sorry for the none related post again!

Why is using a timestamp column for the rising column in DB Connect suboptimal?

.conf25 Registration is OPEN!

Detecting Cross-Channel Fraud with Splunk

Splunk at Cisco Live 2025: Learning, Innovation, and a Little Bit of Mr. Brightside

Are you a member of the Splunk Community?

Why is using a timestamp column for the rising column in DB Connect suboptimal?

.conf25 Registration is OPEN!

Detecting Cross-Channel Fraud with Splunk

Splunk at Cisco Live 2025: Learning, Innovation, and a Little Bit of Mr. Brightside