Splunk AppDynamics

Common agent error in logs: Connection back off limitation in effect

Dietrich_Meier
Communicator

We recently had a short metric gaps in the controller UI (SaaS Controller) for several apps and different agents (DB, App and Machine).

The log files of all the different agents all have a common theme:
"Connection back off limitation in effect"

"Fatal transport error while connecting to URL" also comes up sometimes as a similar error logged by agents.

I did a quick search online and this seems to be an AppD agent specific log file entry. The AppD community also had about 12 entries going back to 2017, all with no clear solution to this error message. (Summary below)
Docs site search returns nothing.

I opened an AppD support case and will see what they say, but it is frustrating to see that this is a common thing reported by different agents without a clear cause for it documented anywhere. I wonder why something like this is logged the way it is which makes me think its something to do with a limitation on the Controller side of things, when all other community posts and agent logs make it look like it is not Controller related. 

Examples of our recent issue:
* I tried to redact the important bits

DB Agent v23.2.2
[Entity-Registration-Scheduler-19] 31 Oct 2023 10:50:25,932 WARN EntityRegistrar - Fail to register [DBSession] entities:
java.lang.RuntimeException: Connection back off limitation in effect: /controller/instance/***/registerServerSatelliteEntity
at com.singularity.ee.agent.dbagent.task.reporter.EntityRegistrar.registerEntities(EntityRegistrar.java:276) ~[db-agent.jar:Database Agent v23.2.0.0 GA compatible with 4.5.2.0 Build Date 2023-02-22]

Other DB Agent v23.2.2
[<**DB Collector Name***>-Transient-Event-Scheduler-2] 31 Oct 2023 10:51:22,737 WARN SystemAgentTransientEventChannel - Error sending event data to controller: Connection back off limitation in effect: /controller/instance/***/transient-channel


Different DB agent v23.8.8
[<**DB Collector Name***>-Scheduler-3] 31 Oct 2023 10:51:52,288 INFO ADBCollector - Collected one-minute data for ***
[Entity-Registration-Scheduler-2] 31 Oct 2023 10:51:52,850 WARN EntityRegistrar - Fail to register [Query] entities:
java.lang.RuntimeException: Connection back off limitation in effect: /controller/instance/3945944/registerSQLQuery

SIM (Machine)Agent
**ServerName**==> [AD Thread-Metric Reporter0] 31 Oct 2023 10:51:56,554 ERROR ManagedMonitorDelegate - Error sending metrics - will requeue for later transmission
com.singularity.ee.agent.commonservices.metricgeneration.metrics.MetricSendException: Connection back off limitation in effect: /controller/instance/***/metrics

SIM Agent v22x
***Hostname***==> [AD Thread-Metric Reporter0] 31 Oct 2023 10:51:48,204 ERROR ManagedMonitorDelegate - Fatal transport error while connecting to URL [/controller/instance/***/metrics]: org.apache.http.conn.ConnectTimeoutException: Connect to ***:443 [***/***, ***, ***] failed: connect timed out
***Hostname***==> [AD Thread-Metric Reporter0] 31 Oct 2023 10:51:48,204 WARN ManagedMonitorDelegate - Error sending metric data to controller:Fatal transport error while connecting to URL [/controller/instance/***/metrics]
***Hostname***==> [AD Thread-Metric Reporter0] 31 Oct 2023 10:51:48,204 ERROR ManagedMonitorDelegate - Error sending metrics - will requeue for later transmission
com.singularity.ee.agent.commonservices.metricgeneration.metrics.MetricSendException: Fatal transport error while connecting to URL [/controller/instance/***/metrics]


Summary of other AppD community posts with a similar error from agent log files:

2017 Community post
https://community.appdynamics.com/t5/NET-Agent-Installation/Azure-Cloud-Service-No-load-detected-App...
No solutions in ticket/unresolved

2017 Community post no 2
https://community.appdynamics.com/t5/Dynamic-Languages-Node-JS-Python/Could-not-connect-to-the-contr...
Python agent issues
Mentions proxy setup for outbound requests from agent server, but no clear answer other than bringing the node online on controller, whatever that means

2017 Community post no3
https://community.appdynamics.com/t5/NET-Agent-Installation/Failed-to-add-web-app-to-AppDynamics/td-...
No confirmed solution, but last posts suggests using non ssl settings which is not a great solution if that is the fix

2017 Community post no4
https://community.appdynamics.com/t5/NET-Agent-Installation/net-Agent-registering-issue/td-p/29595
Proxy setting highlighted but no ultimate solution

2018 Community post
https://community.appdynamics.com/t5/NET-Agent-Installation/BT-requests-and-survival/td-p/29629
Answers do not address the "Connection back off limitation in effect" issue

2018 Community post no2
https://community.appdynamics.com/t5/NET-Agent-Installation/After-NET-Agent-upgrade-to-4-3-7-1-we-ar...
Issue shown in one log file extract but not addressed

2018 Community post no3
https://community.appdynamics.com/t5/Controller-SaaS-On-Premises/Unable-to-connect-to-the-controller...
No final solution

2018 Community post no4
https://community.appdynamics.com/t5/NET-Agent-Installation/Need-help-on-installation-of-agent/td-p/...
Post never had a resolution

2019 Community post
https://community.appdynamics.com/t5/NET-Agent-Installation/no-metrics-in-controller-after-net-agent...
Possible issue with AppDynamicsConfig.json
No clear answer/solution

2019 Community post no2
https://community.appdynamics.com/t5/NET-Agent-Installation/Net-core-agent-Linux-is-not-connecting-t...
No solution

2021 Community post
https://community.appdynamics.com/t5/Knowledge-Base/How-do-I-install-the-NET-Core-Microservices-Agen...
Answers do not address the "Connection back off limitation in effect" issue

2023 Community post
https://community.appdynamics.com/t5/Controller-SaaS-On-Premises/Could-not-connect-to-the-controller...
Suggests ignoring or disabling the errors

Here is to hoping there is a solution or better answer to this issue.


0 Karma

Cansel_OZCAN
Path Finder

Hi Dietrich,

This log pattern is basically, a very generic one. Unfortunately, there are lots of different reasons but mostly related to the controller side.

I want to ask you, can you please run this grep command  in your server files below command which are hosted in /appdynamics/controller/server folder below,

grep -i "Buffer Overflow" server* | wc -l

grep -i "dropping event" server* | wc -l

grep -i "Caused by: java.nio.BufferOverflowException" server* | wc -l

Than let's see what is the root cause of your problem 🙂

Thanks

Cansel

Dietrich_Meier
Communicator

Hi Cansel

Its a SaaS Controller, so cannot run those commands on the Controller. 
AppD Support also informed me that this is not a controller-side issue, but rather general network issue.


0 Karma

Cansel_OZCAN
Path Finder

Hi Dietrich,

Yes another reason to get this error .

Thanks for letting me know.

0 Karma

iamryan
Community Manager
Community Manager

Hi @Dietrich.Meier,

Thanks for sharing this. I've shared it with the people who I think should see it. 

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...