Splunk AppDynamics

Race Condition in PHP Agent

CommunityUser
Splunk Employee
Splunk Employee

I'm currently investigating a problem with the PHP Agent, which fails occasionally. The two errors that occur are

  • Segmentation fault (core dumped)
  • PHP Warning: Unsupported exit call type (missing delegate) in /srv/www/...

A colleague of mine already contacted AppDynamics, but unfortunately he is on holidays, so I can't follow up to that discussion. Their suggestion was to enable trace-level logging, which I did and which allowed me to capture some information concerning those issues.

The second case seems to be caused by a timeout from the service that provides the configuration (proxy agent?), "[config.ZMQConfigTransport] timed out (2000us) waiting for config. Actual wait time: 1846us". That's pretty short, in particular since the machine only runs on a single CPU.

The first case is harder. Comparing a successful with a failed call doesn't turn up any significant differences. The last line in the log is "[agent] started API exit call of type EXIT_HTTP", after that the segfault kills the process. The topmost frames are in libstdc++, std::string::assign() in particular, which I found out after running the program in a debugger.

Some system details...

  • Linux 64 bit on an AWS-hosted VM
  • AppDynamics versions 4.2 and 4.3 both show the same issue.
  • No webserver installed, this is a pure CLI application. It happened on a production machine with a webserver, too, after which we isolated the problem to a single machine.
  • Trace-level logging seems to reduce the likelihood of the problem occurring.

So... any suggestions what to try next? Is there even a bugtracking system where I could search for similar issues?

Thanks everyone!

Uli

Labels (1)
0 Karma

Ayush_Ghosh
Path Finder

Hi,

About the two issue you are facing 

  • Segmentation fault (core dumped)
    • Do you have a core dump ?
    • Is OpCache enabled ? If yes, could you try disabling that & check if the same can be reproduced.
  • PHP Warning: Unsupported exit call type (missing delegate) in /srv/www/...
    • Are you getting this continiously throughout the application lifecycle? 
    • While the application is started, there is a delay to get the config from Controller. So this is when you get this error. This should go away when the agent gets the config from the controller.
    • The 2000us wait to get config is between the PHP process & the Proxy. Thus for an IPC this is not too short TTL.

Is it possible to share the trace level logs & core dump if any. Then I would share an SFTP credentials for same.

Please don't attach them here as it might contain sensitive information.

Thanks

Ayush

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hello Ayush,

Concerning the segmentation fault:

  • I don't have a core file yet, but I could surely produce one.
  • OpCache was not enabled ("opcache.enable_cli => Off => Off"). Even when disabling the module as a whole, I can still produce the segfaults.

Concerning the missing delegate:

  • The error only occurs sporadically. According to the logs, the response usually comes after less than 2ms. Often, the timestamp doesn't show any delay at all, so it's less than 1ms then.
  • Concerning the timeout, consider a single-CPU machine. When the request is sent, the scheduler needs to switch to the proxy. If any other processes are in the ready state, those process will get a timeslice first, which can easily exceed a millisecond. The same applies to a multicore machine under load, btw. Is there a way to tweak that timeout? Also, what are the side-effects of the timeout? If it's just one sample not being reported and that doesn't happen often, I could live with that or maybe tweak the scheduler settings.

I can share the trace-level logs and probably also the core dump.

Thanks for your help!

Uli

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Short update: Concerning the segfaults, a support ticket was created and the AppDynamics team were already able to reproduce the issue, so a fix shouldn't be too far.

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi!

We are facing with the same problem. I hoped that with the lastest version of appdynamics PHP agent,  this problem was solved but it doesn't looks like.

We are getting segmentation fault errors on apache using appDynamics PHP 4.4.3 agent (latest) and PHP 5.4

Any help?

Thanks!

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi!

At the moment, we are not using AppDynamics on the systems that formerly were impacted by that bug. Also, since the the dev team of AppDynamics was able to reproduce the bug and fix it since it was filed, I believe that it shouldn't be a concern. Maybe it's a different bug with similar symptoms.

Two notes though:

  • 4.4.3 is not an AppDynamics version number, as they don't use a triple but a quadruple (like e.g. 4.2.13.1). Also, it might be relevant which versions of both the machine agent as well as the PHP agent were installed.
  • PHP 5.4 is something I personally would refuse to support in any way. At the very least, PHP 5.6 should be used. Still, check out the announcements at PHP's website, their support for versions before 7.1 ends this year.

Good luck!

Uli

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Synthetic Monitoring - Resolved Incident on Detector Alerts

We’ve discovered a bug that affected the auto-clear of Synthetic Detectors in the Splunk Synthetic Monitoring ...

Video | Tom’s Smartness Journey Continues

Remember Splunk Community member Tom Kopchak? If you caught the first episode of our Smartness interview ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud? Learn how unique features like ...