Getting Data In

Why this error after upgrade to 9.0 "ERROR TcpOutputQ [<thread id> TcpOutEloop] - Unexpected event id=<eventid>"?

hrawat_splunk
Splunk Employee
Splunk Employee

After upgrade to 9.0 seeing following

ERROR TcpOutputQ [<thread id> TcpOutEloop] - Unexpected event id=<eventid>

Tags (2)
0 Karma
1 Solution

hrawat_splunk
Splunk Employee
Splunk Employee

If useACK set to true and batch mode is on(default on) with Splunk 9.0, there is a possibility of hitting following error log messages.

"Unexpected event id"
"Invalid ACK received from indexer"
"Got unexpected ACK with eventid"

This may also lead to blocked queues on forwarding tier.

autoLBVolume and autoBatch while processing an event, apply limit using raw size of the event. However if there are  raw less events ( e.g. metrics events) autoLBVolume and autoBatch will end up sending lot more events then configured limits to receiver.
With autoLBVolume, it results in more than expected/configured events distributed to receivers.

With autoBatch, it results in batch of lot more events than expected. That means while a batch of thousands of events being sent to receiver, at the same time some events are already getting acknowledged.
Forwarder creates a list of  events to be acknowledged after successfully sending batch of events. However if the batch is in-flight over TCP layer and forwarder receives an ACKed event of the batch, it's not in the list of expected events to be acknowledged.  That leads to above ERROR.

Workaround: Either set useACK=false or autoBatch=false

Issue is fixed by 9.0.3 patch.

Note: 
After 9.0.3 upgrade, you will still see benign “Unexpected event id” log message. However there should not be following log messages.
"Invalid ACK received from indexer"
"Got unexpected ACK with eventid"


 

View solution in original post

vinayakwagh
Engager

It helps Thanks

 

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

If useACK set to true and batch mode is on(default on) with Splunk 9.0, there is a possibility of hitting following error log messages.

"Unexpected event id"
"Invalid ACK received from indexer"
"Got unexpected ACK with eventid"

This may also lead to blocked queues on forwarding tier.

autoLBVolume and autoBatch while processing an event, apply limit using raw size of the event. However if there are  raw less events ( e.g. metrics events) autoLBVolume and autoBatch will end up sending lot more events then configured limits to receiver.
With autoLBVolume, it results in more than expected/configured events distributed to receivers.

With autoBatch, it results in batch of lot more events than expected. That means while a batch of thousands of events being sent to receiver, at the same time some events are already getting acknowledged.
Forwarder creates a list of  events to be acknowledged after successfully sending batch of events. However if the batch is in-flight over TCP layer and forwarder receives an ACKed event of the batch, it's not in the list of expected events to be acknowledged.  That leads to above ERROR.

Workaround: Either set useACK=false or autoBatch=false

Issue is fixed by 9.0.3 patch.

Note: 
After 9.0.3 upgrade, you will still see benign “Unexpected event id” log message. However there should not be following log messages.
"Invalid ACK received from indexer"
"Got unexpected ACK with eventid"


 

woodcock
Esteemed Legend

It is back in v9.0.1

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

See my updated answer. 9.0.1 still logs the ERROR, but it does not block forwarder.

0 Karma

Sithima
Explorer

If the issue is fixed in 9.0.1, why am I getting the same error message in Splunk 9.0.1?

ERROR TcpOutputQ [<id> TcpOutEloop] - Unexpected eventid=<id>

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

9.0.1 has not suppressed the ERROR log. It fixes the underlying tcpout queue blockage  issue. While you see the ERROR log but no tcpout queue blockage (as seen with 9.0.0) is an indication that the tcpout queue blockage  issue is resolved.

 

Will suppress 9.0.1 benign ERROR log in future releases. 

woodcock
Esteemed Legend

Actually the problem is still there.  I was getting continuous crashes on my HWF.

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

That crash is still an issue and will be fixed.  It happens if forceTimebasedAutoLB=true
Workaround for 9.0.1 TcpOutputQ crash 
Set one of the following

forceTimebasedAutoLB=false

or

autoBatch=false

or

connectionsPerTarget=1

This crash is applicable if UF/HF resolves < 10 target IP addresses and forceTimebasedAutoLB=true.  

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk Cloud Platform 9.1.2308?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2308! Analysts can ...

Index This | Why do they call it hyper text?

November 2023 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

State of Splunk Careers 2023: Career Resilience and the Continued Value of Splunk

For the past three years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...