Deployment Architecture

"ERROR OutputUtil - Error when doing roll transactions" when indexers try to archive buckets S3 Hadoop Data Roll in 6.5.0 - how to fix?

heroku_curzonj
Explorer

We are getting a bunch of the following errors as our AWS EC2 indexers try to archive buckets to S3 with Hadoop Data Roll.

How can we fix them or will they get retried and we can ignore them, if so how? Many buckets are being successfully archived though, so this error doesn't happen with every bucket.

2016-10-20 17:50:56.319 +0000 ERROR OutputUtil - Error when doing roll transaction: roll_route=" from splunk_index=main, to virtual_index=main_archive" bucket="db_1476763064_1476758037_5211" exception="Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK" 
com.amazonaws.AmazonClientException: Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK
    at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
    at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507)
    at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143)
    at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131)
    at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189)
    at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134)
    at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler
    at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:150)
    at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseCopyObjectResponse(XmlResponsesSaxParser.java:417)
    at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:192)
    at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:189)
    at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
    at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:44)
    at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:30)
    at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:712)
    ... 13 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
    at sun.security.ssl.InputRecord.read(InputRecord.java:480)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:944)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:901)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
    at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
    at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:251)
    at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:209)
    at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:171)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:154)
    at java.io.BufferedReader.read1(BufferedReader.java:205)
    at java.io.BufferedReader.read(BufferedReader.java:279)
    at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:141)
    ... 20 more
 
Labels (2)
0 Karma
1 Solution

kschon_splunk
Splunk Employee
Splunk Employee

I don't know what is causing this error, but as for your question about whether the buckets will be retried, the answer is yes. Every time Hadoop Data Roll runs (by default, once per hour), it will re-try to copy all buckets which failed before. I would be interested in what percentage of your buckets are resulting in errors. If you click on Settings -> Virtual indexes -> Archived Indexes -> View Dashboards, you should be able to see how many buckets have been copied over the last day or two, and how many errors have occurred. Do the numbers look similar? And is the same bucket showing up in more than one error?

View solution in original post

0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

I don't know what is causing this error, but as for your question about whether the buckets will be retried, the answer is yes. Every time Hadoop Data Roll runs (by default, once per hour), it will re-try to copy all buckets which failed before. I would be interested in what percentage of your buckets are resulting in errors. If you click on Settings -> Virtual indexes -> Archived Indexes -> View Dashboards, you should be able to see how many buckets have been copied over the last day or two, and how many errors have occurred. Do the numbers look similar? And is the same bucket showing up in more than one error?

0 Karma

heroku_curzonj
Explorer

I wrote a dashboard that joins the bucket archive logs with dbinspect to audit that every bucket is archived and they are. Every buckets encounters an error gets retried successfully shortly after.

The big question now is can I make the search UI not put those error messages in the messages menu. They make my users very nervious.

0 Karma

svelagala
Loves-to-Learn

@heroku_curzonj Can you please share the query to compare count of splunk indexers buckets older than 90 days from the current date with the hadoop data roll archived buckets.

Recently I have archived buckets of _internal index(older than 90 days) from one site of splunk indexers to Hadoop cluster using https://docs.splunk.com/Documentation/Splunk/8.0.3/Indexer/ArchivingindexestoHadoop.
I see buckets copied to Hadoop cluster and I am able to view events from archived index.

But my challenge here is I see more buckets count in Hadoop cluster than in the splunk indexers from the dashboards Settings->virtual indexes -> archived indexes-> View dashboards

I used SPL query "dbinspect index=_internal |stats count by splunk_server |addcoltotals" with the time range older than 90 days.

Please help me in knowing what went wrong in my above approach or share the exact query to get the comparison of buckets count between archived index and splunk index

0 Karma

heroku_curzonj
Explorer

@svelagala I'm sorry, I no longer with with the splunk systems and don't have notes on this topic anymore.

0 Karma

pj
Contributor

Did you get an answer on how to suppress these bulletin messages?

0 Karma

heroku_curzonj
Explorer

No, we never did get an answer. We just told users frequently enough to ignore them that they eventually listened. There errors are still present today.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...