Getting Data In

DB Connect: Why am I missing some DB entries via dbmon-tail using a rising column with duplicate timestamps?

johann_brajer
New Member

Hi experts,

I am trying to find out why i am missing some DB entries in my splunklogs from my "postgresql" DB via DBConnect (DBX) "dbmon-tail" ?
thing is that i can only use a timestamp as "Rising Column" but for some entries the timestamp seems to be the same duplicate timestamps - for the same time more entries).
my DB uses epochtime as timestamp.

Could it be that i have a problem with my rising Culomn and therefore missing entries...as i have more entries at exact the same time in seconds?

When i am trying to do a query of the DB table i am getting all the entries...
only if i to "dbmon-tail" i miss them....

my sql-query Config in splunk looks like (of course these query does not make much sense but i cleared some entries):


*select
scheduled_job_run.started_at, #timeColumn – in EpochTime
scheduled_job_run.last_filename from
job_run_stat,
scheduled_job_run, where
job_run_stat.job_id=scheduled_job_run.job_id AND scheduled_job_run.started_at >= (now() - '3 hour'::INTERVAL)

just a security to not query the whole DB in case of a long "query-break" {{AND $rising_column$ > ?}}*


and my DBX Config looks like: Im Config-File habe ich folgendes geändert:


[dbmon-tail://x.x.x.x/Test] host = hostname interval = auto output.format
= kv output.timestamp = 1 output.timestamp.format = yyyy-MM-dd HH:mm:ss output.timestamp.parse.format
= yyyy-MM-dd HH:mm:ss query = select \r\n to_char(scheduled_job_run.started_at,'YYYY-MM-DD HH24:MI:SS') log_date,\r\n scheduled_job_run.started_at,\r\n scheduled_job_run.last_filename \r\n from \r\n job_run_stat,\r\n scheduled_job_run,\r\n where \r\n scheduled_job_run.started_at >= (now()
- '3 hour'::INTERVAL) \r\n {{AND $rising_column$ > ?}} sourcetype = sourcetype_name table = job_run_stat_interval tail.rising.column = log_date


is there a way to guarantee the query of all entries of DB??

Hope you have some good hints for me 🙂

Regards jo

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Using a rising column that can have duplicates (more precisely, a column that is not strictly monotonically rising) is quite problematic. You're probably filtering WHERE $rising_column$ > ?, so if you get one row with value X it'll ignore other rows with value X that are inserted later.

The best way to solve this is to have a strictly monotonically rising column. Talk to your DBA if there's stuff you may not know about that can be used, or talk to your developers if they can add such a column.

A neat way to get around this whole mess is to introduce a delay/offset into your query. If you know the values inserted into the table are at most T seconds old then you could add a condition to the WHERE that only looks at rows older than T seconds. This way you will not look at timestamps that have been partially written, ie timestamps that might get a later row inserted with the same timestamp.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...