8.0.1 upgraded Heavy Forwarder- TcpOutputProc - Po...

JDukeSplunk · ‎02-28-2020

We have a support ticket open, but I thought I'd also ask the community. Since upgrading our Splunk to 8.0.1 this one HF has been spewing "TcpOutputProc - Possible duplication of events " for most channels. As well as "TcpOutputProc - Applying quarantine to ip=xx.xx.xx.xx port=9998 _numberOfFailures=2"

We upgraded on the 15th near midnight. This is a count of those the errors from that host.
2020-02-14 0
2020-02-15 623
2020-02-16 923874
2020-02-17 396920
2020-02-18 678568
2020-02-19 602100
2020-02-20 459284
2020-02-21 1177642

Here is a count from the indexer cluster showing the number of blocked=true events. One would expect these to be similar in count if the indexers were telling the HF to go elsewhere because it's queues were full.

index=_internal host=INDEXERNAMES sourcetype=splunkd source=/opt/splunk/var/log/splunk/metrics.log blocked=true component=Metrics
| timechart span=1d count by source

2020-02-14 7
2020-02-15 180
2020-02-16 260
2020-02-17 15
2020-02-18 18
2020-02-19 2415
2020-02-20 1
2020-02-21 2

Lastly, it's not just one source or channel, it's everything from the host.

index=_internal component=TcpOutputProc host=ghdsplfwd01lps log_level=WARN duplication
| rex field=event_message "channel=source::(?[^|]+)"
| stats count by channel

/opt/splunk/var/log/introspection/disk_objects.log 51395
/opt/splunk/var/log/introspection/resource_usage.log 45470
mule-prod-analytics 42192
/opt/splunk/var/log/splunk/metrics.log 28283
web_ping://PROD_CommerceHub 27881
web_ping://V8_PROD_CustomSolr5 27877
web_ping://V8_PROD_WebServer4 27873
web_ping://EnterWorks PRD 27871
web_ping://RTP DEV 27870
web_ping://Ensighten 27869
web_ping://RTP 27867
bandwidth 20570
cpu 19949
iostat 19946
ps 19821

Any ideas?

isoutamo · ‎03-06-2020

Hi

If you have many separate transforms on props.conf for individual source/source type etc. try to combine those to one line e.g.

TRANSFORMS-foo = foo1
TRANSFORMS-bar = bar1

To

TRANSFORMS-foobar = foo1, bar1

This helps in our case after update 6.6.5 to 7.3.3.

Ismo

JDukeSplunk · ‎03-06-2020

The HF is still "sick" but here are some things we did that seemed to help.

Edited the outputs.conf that this HF used to output to forwarders within it's own site.
Removed useACK=true from outputs.conf

I'm a little concerned about #2 there. We could still be having issues with the outputs, only now the events are being dropped on the floor. In other words the condition may still be present, we have simply turned off the logging by removing useAck.

8.0.1 upgraded Heavy Forwarder- TcpOutputProc - Possible duplication of events

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024