Hi experts,
I am trying to find out why i am missing some DB entries in my splunklogs from my "postgresql" DB via DBConnect (DBX) "dbmon-tail" ?
thing is that i can only use a timestamp as "Rising Column" but for some entries the timestamp seems to be the same duplicate timestamps - for the same time more entries).
my DB uses epochtime as timestamp.
Could it be that i have a problem with my rising Culomn and therefore missing entries...as i have more entries at exact the same time in seconds?
When i am trying to do a query of the DB table i am getting all the entries...
only if i to "dbmon-tail" i miss them....
my sql-query Config in splunk looks like (of course these query does not make much sense but i cleared some entries):
*select
scheduled_job_run.started_at, #timeColumn – in EpochTime
scheduled_job_run.last_filename from
job_run_stat,
scheduled_job_run, where
job_run_stat.job_id=scheduled_job_run.job_id AND scheduled_job_run.started_at >= (now() - '3 hour'::INTERVAL)
and my DBX Config looks like: Im Config-File habe ich folgendes geändert:
[dbmon-tail://x.x.x.x/Test] host = hostname interval = auto output.format
= kv output.timestamp = 1 output.timestamp.format = yyyy-MM-dd HH:mm:ss output.timestamp.parse.format
= yyyy-MM-dd HH:mm:ss query = select \r\n to_char(scheduled_job_run.started_at,'YYYY-MM-DD HH24:MI:SS') log_date,\r\n scheduled_job_run.started_at,\r\n scheduled_job_run.last_filename \r\n from \r\n job_run_stat,\r\n scheduled_job_run,\r\n where \r\n scheduled_job_run.started_at >= (now()
- '3 hour'::INTERVAL) \r\n {{AND $rising_column$ > ?}} sourcetype = sourcetype_name table = job_run_stat_interval tail.rising.column = log_date
is there a way to guarantee the query of all entries of DB??
Hope you have some good hints for me 🙂
Regards jo
Using a rising column that can have duplicates (more precisely, a column that is not strictly monotonically rising) is quite problematic. You're probably filtering WHERE $rising_column$ > ?
, so if you get one row with value X it'll ignore other rows with value X that are inserted later.
The best way to solve this is to have a strictly monotonically rising column. Talk to your DBA if there's stuff you may not know about that can be used, or talk to your developers if they can add such a column.
A neat way to get around this whole mess is to introduce a delay/offset into your query. If you know the values inserted into the table are at most T seconds old then you could add a condition to the WHERE that only looks at rows older than T seconds. This way you will not look at timestamps that have been partially written, ie timestamps that might get a later row inserted with the same timestamp.