I have a DBX 3.1.2 job that's failing at some point along the way. I don't get any error messages (everything is set to DEBUG levels), just the following message in the metrics logs:
2018-05-03 12:06:37.976 -0400 INFO c.s.dbx.server.task.listeners.JobMetricsListener - action=collectjobmetrics connection=mydbconnection jdbcurl=null recordreadsuccesscount=3444 dbreadtime=397794 recordreaderrorcount=1 hecuploadtime=102 hecrecordprocesstime=13 formathecsuccesscount=3444 hecuploadbytes=1631645 status=FAILED inputname=mydbinput batchsize=1000 errorthreshold=N/A isjmxmonitoring=false starttime=2018-05-0312:00:00 endtime=2018-05-0312:06:37 duration=397965 readcount=3444 writecount=3000 filteredcount=0 errorcount=0
As you can see, not everything in the readcount field is making it into the writecount field. But when I search for error messages related to this input, I don't get anything beyond this.
Has anybody else had this problem? Where did you look?
3000 is a suspiciously round number and also a suspicious multiple of your batch_size.
Also, that hecuploadtime of 102 seconds is... I hope that's in ms. Even then that seems kind of high for a few thousand records totaling a MB and a half.
Have you confirmed that the right number of records made it into Splunk or not? I'm pretty sure it didn't, but maybe this is an error on the internal's metrics?
I agree that it's a suspicious multiple.
Upload times are in ms as far as I can see...this is one of the most heavily taxed databases in the environment, so it's going to be a bit higher than one would like.
We have confirmed that Splunk is not reading the appropriate amount of records. We are missing entries.