We are getting a bunch of the following errors as our AWS EC2 indexers try to archive buckets to S3 with Hadoop Data Roll.
How can we fix them or will they get retried and we can ignore them, if so how? Many buckets are being successfully archived though, so this error doesn't happen with every bucket.
2016-10-20 17:50:56.319 +0000 ERROR OutputUtil - Error when doing roll transaction: roll_route=" from splunk_index=main, to virtual_index=main_archive" bucket="db_1476763064_1476758037_5211" exception="Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK"
com.amazonaws.AmazonClientException: Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:150)
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseCopyObjectResponse(XmlResponsesSaxParser.java:417)
at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:192)
at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:189)
at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:44)
at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:30)
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:712)
... 13 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.read(InputRecord.java:480)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:944)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:901)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:251)
at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:209)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:171)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.read1(BufferedReader.java:205)
at java.io.BufferedReader.read(BufferedReader.java:279)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:141)
... 20 more
I don't know what is causing this error, but as for your question about whether the buckets will be retried, the answer is yes. Every time Hadoop Data Roll runs (by default, once per hour), it will re-try to copy all buckets which failed before. I would be interested in what percentage of your buckets are resulting in errors. If you click on Settings -> Virtual indexes -> Archived Indexes -> View Dashboards, you should be able to see how many buckets have been copied over the last day or two, and how many errors have occurred. Do the numbers look similar? And is the same bucket showing up in more than one error?
I don't know what is causing this error, but as for your question about whether the buckets will be retried, the answer is yes. Every time Hadoop Data Roll runs (by default, once per hour), it will re-try to copy all buckets which failed before. I would be interested in what percentage of your buckets are resulting in errors. If you click on Settings -> Virtual indexes -> Archived Indexes -> View Dashboards, you should be able to see how many buckets have been copied over the last day or two, and how many errors have occurred. Do the numbers look similar? And is the same bucket showing up in more than one error?
I wrote a dashboard that joins the bucket archive logs with dbinspect to audit that every bucket is archived and they are. Every buckets encounters an error gets retried successfully shortly after.
The big question now is can I make the search UI not put those error messages in the messages menu. They make my users very nervious.
@heroku_curzonj Can you please share the query to compare count of splunk indexers buckets older than 90 days from the current date with the hadoop data roll archived buckets.
Recently I have archived buckets of _internal index(older than 90 days) from one site of splunk indexers to Hadoop cluster using https://docs.splunk.com/Documentation/Splunk/8.0.3/Indexer/ArchivingindexestoHadoop.
I see buckets copied to Hadoop cluster and I am able to view events from archived index.
But my challenge here is I see more buckets count in Hadoop cluster than in the splunk indexers from the dashboards Settings->virtual indexes -> archived indexes-> View dashboards
I used SPL query "dbinspect index=_internal |stats count by splunk_server |addcoltotals" with the time range older than 90 days.
Please help me in knowing what went wrong in my above approach or share the exact query to get the comparison of buckets count between archived index and splunk index
@svelagala I'm sorry, I no longer with with the splunk systems and don't have notes on this topic anymore.
Did you get an answer on how to suppress these bulletin messages?
No, we never did get an answer. We just told users frequently enough to ignore them that they eventually listened. There errors are still present today.