All Apps and Add-ons

Why is using a timestamp column for the rising column in DB Connect suboptimal?

jcoates_splunk
Splunk Employee
Splunk Employee

There's a lot of database tables where the date and time stamp is the only thing that looks anything like a rising column. That can work and it is supported, but it's not the best scenario. Here's why.

1 Solution

jcoates_splunk
Splunk Employee
Splunk Employee

Q: Why is using a timestamp column for rising column suboptimal?
A: Rising columns are used to select subsets of data, using the construction SELECT THINGS FROM TABLE WHERE RISING_COLUMN > LAST_OBSERVED_VALUE. Doing this makes the following assumptions:

  • There is never more than one row created in a given time. If you're setting the time to a one second level of resolution and getting five records per second, you're going to lose four records from every run.
  • All of the times in the table are in a single format from a single timezone, in which daylight savings and clock corrections never occur.

Q: What if we switch the rising column logic from greater than (>) to greater than or equal to (>=)?
A: That will prevent data loss, but it will produce duplications instead.

Q: What would be better to do instead?
A: Ideally the database will provide a row id for the table in question. This makes the database responsible for duplication avoidance, and all modern databases have good support for ensuring that the data will be correct. This also takes timezones and time formatting out of the question. If there isn't a row id, using a view to produce one is a powerful option.

Q: But I don't have a choice!
A: Go ahead and do what you gotta do, don't let perfect be the enemy of good... but it's best to be aware of the drawbacks instead of being surprised by them.

View solution in original post

jcoates_splunk
Splunk Employee
Splunk Employee

Q: Why is using a timestamp column for rising column suboptimal?
A: Rising columns are used to select subsets of data, using the construction SELECT THINGS FROM TABLE WHERE RISING_COLUMN > LAST_OBSERVED_VALUE. Doing this makes the following assumptions:

  • There is never more than one row created in a given time. If you're setting the time to a one second level of resolution and getting five records per second, you're going to lose four records from every run.
  • All of the times in the table are in a single format from a single timezone, in which daylight savings and clock corrections never occur.

Q: What if we switch the rising column logic from greater than (>) to greater than or equal to (>=)?
A: That will prevent data loss, but it will produce duplications instead.

Q: What would be better to do instead?
A: Ideally the database will provide a row id for the table in question. This makes the database responsible for duplication avoidance, and all modern databases have good support for ensuring that the data will be correct. This also takes timezones and time formatting out of the question. If there isn't a row id, using a view to produce one is a powerful option.

Q: But I don't have a choice!
A: Go ahead and do what you gotta do, don't let perfect be the enemy of good... but it's best to be aware of the drawbacks instead of being surprised by them.

jcoates_splunk
Splunk Employee
Splunk Employee

Adding another failure scenario that has been coming up recently... the other assumption of doing this is that you're always accurately parsing your string into a datetime format. Say you set rising column to a VARCHAR datetime... in English, the lexical order of month names is not the same as the chronological order, so your input stops every month.

0 Karma

jcoates_splunk
Splunk Employee
Splunk Employee

And, because every conversation on this topic ends up going here, let's link this: http://docs.splunk.com/Documentation/DBX/2.3.0/DeployDBX/SQLtipsandtricks#Use_CAST_or_CONVERT_to_imp...

BenTan
Path Finder

Hi Jack,

This is none related to the question itself, but I am facing an issue with the sp13audit.exe that is being packaged inside Splunk Add-On for Microsoft SharePoint and there seems to be no fixes online. Would you mind sharing the source code of the exe file if it's not too much of a trouble? We are really trying to get the app up and running.

Really appreciate if you could lend a helping hand.

Thanks!

And sorry for the none related post again!

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...