Hi,
right now i have the problem that the agents are not reporting correct anylonger.
In the log Files i see a lof of ERROR messages:
[AD Thread Pool-Global1] 10 Mär 2015 15:46:10,119 ERROR RequestSegmentDataQueue - Fatal transport error: Read timed out
[AD Thread Pool-Global1] 10 Mär 2015 15:46:10,119 ERROR RequestSegmentDataQueue - Could not send snapshots to controller Fatal transport error: Read timed out
[AD Thread Pool-Global3] 10 Mär 2015 15:46:20,791 ERROR RequestSegmentDataQueue - Fatal transport error: Connection reset
[AD Thread Pool-Global3] 10 Mär 2015 15:46:20,791 ERROR RequestSegmentDataQueue - Could not send snapshots to controller Fatal transport error: Connection reset
Any idea what is causing this?
Prior to those error messages i see the following error in the log:
ERROR MetricHandler - Error registering metrics
com.singularity.ee.agent.commonservices.metricgeneration.metrics.e: Error registering metrics with controller Fatal transport error: Connection reset
at com.singularity.ee.agent.appagent.kernel.ub.a(ub.java:125)
at com.singularity.ee.agent.commonservices.metricgeneration.a.a(a.java:169)
at com.singularity.ee.agent.commonservices.metricgeneration.d.a(d.java:270)
at com.singularity.ee.agent.commonservices.metricgeneration.g.run(g.java:103)
at com.singularity.ee.util.javaspecific.scheduler.n.run(n.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at com.singularity.ee.util.javaspecific.scheduler.y.e(y.java:315)
at com.singularity.ee.util.javaspecific.scheduler.a.b(a.java:150)
at com.singularity.ee.util.javaspecific.scheduler.b.a(b.java:123)
at com.singularity.ee.util.javaspecific.scheduler.b.b(b.java:208)
at com.singularity.ee.util.javaspecific.scheduler.b.run(b.java:238)
at com.singularity.ee.util.javaspecific.scheduler.i.a(i.java:683)
at com.singularity.ee.util.javaspecific.scheduler.i.run(i.java:715)
at java.lang.Thread.run(Thread.java:745)
Hi Constantin,
We see such errors when there are netork connectivity errors between instance where you have installed AppServerAgent and the controller, we understood that your agent is trying to register at saas account UI https://medicalcolumbusag.saas.appdynamics.com and you have provided account-name and access-key details in <agent_dir>/conf/controller-info.xml in addition to controller host and port details and have restarted jvm and still see the issues.
If above said is not the case, please send the zipped version of <AppServerAgent_dir>/logs folder and <AppServerAgent_dir>/conf folder archive and also provide the output of the following command issued from agent instance:
shell> telnet medicalcolumbusag.saas.appdynamics.com 443
shell> telnet medicalcolumbusag.saas.appdynamics.com 80 (if you are using http port in agent config)
- Also confirm if there are any proxy involved between agent instance and controller instance by any chance here?
Let us know if that clarifies your query, keep us posted requested details to debug further.
Adding to that we see few apps in saas UI are reporting fine so this could be either network issues or agent registeration issue at affected agent instance, keep us posted the logs to assist you further.
Regards,
Arun
Hi,
ok what have i done so far:
1. I have reset the agent so it reloads. No change happened
2. I have restarted the services. Still the same effect
3. Just tested the telnets and they are working fine.
What is confusing me, is that the agents reporting to the controller are located in different networks and locations:
We have one server reporting from our on premise network (which is having this issue)
The rest of the servers is located in an Amazon VPC in Frankfurt (also showing the same issues).
So it seems that it is not an network issue on the agents side but might be on the controller side. But i can not analyze that due to the saas controller.
Regards
Constantin
Hi Constantin,
Can you provide the following to assist you further:
a) archive version <AppSerAgent_dir>/logs path
b) screenshot from controller UI for screen you were referring to for clarity?
Regards,
Arun
Hi Constantin,
Though logs have fatal errors, but we see agent logs are for node "192.168.100.105" from application "transactor" and from your screenshot and from the attached screenshot it is clear that data is now reporting fine,
We see the issue does not exists any more, and data for past 1 days shows fine in app dashboard as well, let us know if you need further assistance on this?
Hi,
I'm not sure about that. I have those fatal errors in the log and i am afraid that some data might be missing. It is true that we have reports currently.
But in the timeslot around 13:15 - 13:20 today there was no data and still isn't so monitoring data is definetly lost.
Regards
Constantin
Hi Constantin,
Can you provide screenshot depicting the issue from UI and also create and share a custom time range with us so that we can drilldown from our end?
Data will not be persists for period during network connectivity issue exists at agent end for long period.
Regards,
Arun