2021-05-05 12:20:20.032 +0000 [QuartzScheduler_Worker-16] ERROR c.s.d.s.task.listeners.RecordWriterMetricsListener - action=unable_to_write_batch
java.net.SocketTimeoutException: Read timed out
2021-05-05 12:20:20.032 +0000 [QuartzScheduler_Worker-16] ERROR org.easybatch.core.job.BatchJob - Unable to write records
java.net.SocketTimeoutException: Read timed out
2021-05-05 12:20:20.032 +0000 [QuartzScheduler_Worker-16] INFO org.easybatch.core.job.BatchJob - Job ‘IPOD_UNBRICK_LOG’ finished with status: FAILED
Did you happen to solve this?
Can you try to increase the "query_timeout" setting on the specific input you're running? It defaults to 30 seconds so you may need to increase this value and see what works for you.
db_inputs.conf
query_timeout = <integer>
# optional
# the max execution time of a SQL, the default is 30 seconds.
You'll see this setting in the GUI too in the inputs configuration part.
Hello @m_pham , the query is set to timeout at a very high value (600 seconds in my case). In general, the query takes 15-20 seconds to run when executed from the UI. However, we continue to observe these errors. I'd also like to mention that there are no network issues and most inputs work fine, that are set to run around the same time.
As a remedy, we've also increased the splunkdConnectionTImeout param inside web.conf settings stanza. This was done because we see that a small portion of results are forwarded, and then it fails. Below is the error stack trace -
2022-02-22 03:33:59.584 -0600 [QuartzScheduler_Worker-28] ERROR org.easybatch.core.job.BatchJob - Unable to write records java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at com.codahale.metrics.httpclient.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:44) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at com.splunk.dbx.server.dbinput.recordwriter.HttpEventCollector.uploadEventBatch(HttpEventCollector.java:122) at com.splunk.dbx.server.dbinput.recordwriter.HttpEventCollector.uploadEvents(HttpEventCollector.java:99) at com.splunk.dbx.server.dbinput.recordwriter.HttpEventCollectorLoadBalancer.uploadEvents(HttpEventCollectorLoadBalancer.java:49) at com.splunk.dbx.server.dbinput.recordwriter.HecEventWriter.writeRecords(HecEventWriter.java:36) at org.easybatch.core.job.BatchJob.writeBatch(BatchJob.java:203) at org.easybatch.core.job.BatchJob.call(BatchJob.java:79) at org.easybatch.extensions.quartz.Job.execute(Job.java:59) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Were you ever able to solve this issue? We have been struggling to deal with it. Making sure THP was turned off on the Linux server helped a lot, but we still see the write errors on occasion.
I'm no DB admin but is there something that can be configured on the DB side to see if it helps with this timeout issue? Sometimes we focus on errors in Splunk logs but there are times it's on the upstream side.
Has anybody found a solution to this error ? We're facing similar issues in our environment.
Did you ever have any luck with this? We're struggling with it, but the best solution so far was to make sure that THP was turned off on the Linux server. We still experience issues a few times a week and wondering if others found a more robust solution.
I believe you are running into queue blockage.
HecEventWriter.writeRecordHecEventWriter.writeRecords
Have you checked the metrics log on the HF to see if you are blocking on the indexing queue?