Splunk Enterprise

Splunk DB Connect timeout - Unable to read next record

edoardo_vicendo
Contributor

Hello,

We had this error on an output query set-up on Splunk DB Connect.

Basically the Splunk query is inserting data into an external database.

 

 

2023-11-08 01:58:32.712 +0100  [QuartzScheduler_Worker-9] ERROR org.easybatch.core.job.BatchJob - Unable to read next record
java.lang.RuntimeException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[836463,5]
Message: Premature EOF
                at com.splunk.ResultsReaderXml.getNextEventInCurrentSet(ResultsReaderXml.java:128)
                at com.splunk.ResultsReader.getNextElement(ResultsReader.java:87)
                at com.splunk.ResultsReader.getNextEvent(ResultsReader.java:64)
                at com.splunk.dbx.server.dboutput.recordreader.DbOutputRecordReader.readRecord(DbOutputRecordReader.java:82)
                at org.easybatch.core.job.BatchJob.readRecord(BatchJob.java:189)
                at org.easybatch.core.job.BatchJob.readAndProcessBatch(BatchJob.java:171)
                at org.easybatch.core.job.BatchJob.call(BatchJob.java:101)
                at org.easybatch.extensions.quartz.Job.execute(Job.java:59)
                at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
                at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[836463,5]
Message: Premature EOF
                at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:599)
                at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83)
                at com.splunk.ResultsReaderXml.getResultKVPairs(ResultsReaderXml.java:306)
                at com.splunk.ResultsReaderXml.getNextEventInCurrentSet(ResultsReaderXml.java:124)
                ... 9 common frames omitted

 

 

The issue was related to a query timeout. We have set-up the upsert_id in the Splunk DB Connect output configuration so that Splunk can go in insert_update.

Looking into _internal log we understood that Splunk, when using the upsert_id, performs a select query for each record it has to insert and then commits every 1000 records (by default):

 

2023-11-10 01:22:28.215 +0100  [QuartzScheduler_Worker-12] INFO  com.splunk.dbx.connector.logger.AuditLogger - operation=dboutput connection_name=SPLUNK_CONN stanza_name=SPLUNK_OUTPUT state=success sql='SELECT FIELD01,FIELD02,FIELD03 FROM MYSCHEMA.MYTABLE WHERE UPSERT_ID=?'

 

 

 

2023-11-10 01:22:28.258 +0100  [QuartzScheduler_Worker-12] INFO  com.splunk.dbx.connector.logger.AuditLogger - operation=dboutput connection_name=SPLUNK_CONN stanza_name=SPLUNK_OUTPUT state=success sql='INSERT INTO MYSCHEMA.MYTABLE (FIELD01,FIELD02,FIELD03) values (?,?,?)'

 

 

Upsert_id is very useful to avoid an sql duplicate key error, and whenever you want to recover data in case the output is failing for some reason. You basically re-run the output query and if the record already exists it is replaced in the sql table.

But the side effect is that the WHERE condition of the SELECT statement can be very inefficient if the Database table start to be huge.

The solution is to create in the output Database table an SQL index on the upsert_id field.

 

The output run passed from 11 minutes to 11 seconds, avoiding to hit the timeout of the Splunk DB Connect (30 seconds by default, calculated for every commit).

 

Best Regards,

Edoardo

_JP
Contributor

Thanks for the tip!  Non-streaming  type pushes like this are often a challenge, and this is one way to manage the coupling of something that likes to be working in the real-time space (Splunk) versus  more of a batch space (the DB).

Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...