Hi All,
I have this strange behaviour and was wondering if anyone has seen this before. I'll explain below the problem.
SInce I upgraded the controller to to version 4.0.7.0 build 40 it seems that the metrics that have the 1 hour granularity (ex: Last 3 days) are no longer displayed at tier level:
1. Take the tier1 as example (this tier has 2 nodes underneath) and display the calls/min graph for the last 15 mins - all looks good
2. Change the time range to Last 3 days - Metrics are no longer displayed - empty graph
3. The thing is that if I display the same graph for the same time range (Last 3 days) for the nodes underneath tier1, the graph is properly displayed - so the issue seem to be related to the aggregated metrics for tier
For now I have run a mysql check just to make sure the tables are not corrupted - all looks good.
Have also set the controller logs on FINE but no errors are logged while I display the Last 3 days graph.
Any help is appreciated.
Thanks,
Claudiu
Hi Claudiu,
Can you quickly try step wise restart of contorller appserver and mysql db (stopController.bat/sh and then issue startController.bat/sh from controller_dir/bin path) and login UI and replicate the issue and send the screenshots from UI and also database.logs and 3-4 recent server.log under <Contorller_dir>/logs path for initial review on this
Hi Akumar,
As requested I have attached the logs and screenshot.
Investihgated further I have found the below:
Here is the query to get the data that is not properly displayed:
###########
SELECT TS_MIN, METRIC_ID, sum_val,count_val,min_val,max_val,cur_val, time_rollup_type, M.cluster_rollup_type, M.name, MD.application_component_instance_id, group_count_val FROM metricdata_hour_agg MD FORCE INDEX(hr_of_day) STRAIGHT_JOIN metric M WHERE ( (metric_id IN (1506378) AND application_component_instance_id = 296) ) AND ts_min IN (24015180,24015240,24015300,24015360,24015420,24015480,24015540,24015600,24015660,24015720,24015780,24015840,24015900,24015960,24016020,24016080,24016140,24016200,24016260,24016320,24016380,24016440,24016500,24016560,24016620,24016680,24016740,24016800,24016860,24016920,24016980,24017040,24017100,24017160,24017220,24017280,24017340,24017400,24017460,24017520,24017580,24017640,24017700,24017760,24017820,24017880,24017940,24018000,24018060,24018120,24018180,24018240,24018300,24018360,24018420,24018480,2401850,24018600,24018660,24018720,24018780,24018840,24018900,24018960,24019020,24019080,24019140,24019200,24019260,24019320,24019380,24019440 ) AND MD.metric_id = M.id ORDER BY TS_MIN
############
And could also find the query that should populate the metricdata_hour_agg, which is below. Also I can see that all partitions for this table, since the upgrade, they have the same dimension which points out that data is not being inserted. This is backed up by an error which I’ll post below.
###insert statement###
INSERT IGNORE INTO metricdata_hour_agg (ts_min, metric_id, rollup_type, cluster_rollup_type, application_component_instance_id, application_id, group_count_val, sum_val, count_val, min_val, max_val, cur_val, weight_value_square, weight_value, hr_of_day, hr_of_week, hr_of_month) SELECT MD.ts_min, MD.metric_id, MAX(MD.rollup_type) rollup_type, MAX(MD.cluster_rollup_type) cluster_rollup_type, MD.application_component_instance_id, MAX(MD.application_id), COUNT(*) * 100, SUM(MD.sum_val), SUM(MD.count_val), (CASE MD.rollup_type WHEN 0 THEN MIN(MD.min_val) WHEN 1 THEN MIN(MD.sum_val) WHEN 2 THEN MIN(MD.cur_val) ELSE 0 END), (CASE MD.rollup_type WHEN 0 THEN MAX(MD.max_val) WHEN 1 THEN MAX(MD.sum_val) WHEN 2 THEN MAX(MD.cur_val) ELSE 0 END), SUM(MD.cur_val), SUM(MD.weight_value_square), SUM(MD.weight_value), MAX(MD.hr_of_day), MAX(MD.hr_of_week), MAX(MD.hr_of_month) FROM metricdata_hour MD USE INDEX (PRIMARY) WHERE ((ts_min DIV 60) % <MOD_ID>) = <MOD_POS> AND MD.ts_min BETWEEN 24019441 AND 24019559 GROUP BY MD.ts_min, MD.metric_id, MD.application_component_instance_id ORDER BY MD.ts_min, MD.metric_id, MD.application_component_instance_id
##################
Error found in application logs corresponding to the insert statement:
####
[#|2015-09-02T08:18:50.606+0000|SEVERE|glassfish3.1.2|com.appdynamics.METRICS.WRITE|_ThreadID=6407;_ThreadName=Thread-5;|failed doing rollup: metricdata_hour_agg :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key 'PRIMARY' doesn't exist in table 'MD'
at sun.reflect.GeneratedConstructorAccessor540.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
####
Looking at metricdata_hour_agg table it does not have primary keys set but all other similar tables (like metricdata_min_agg) have primary keys.
###metricdata_hour_agg###
+-----------------------------------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------------+-----------------------+------+-----+---------+-------+
| ts_min | int(11) | NO | MUL | NULL | |
| metric_id | int(11) | NO | MUL | NULL | |
| rollup_type | tinyint(4) | NO | | NULL | |
| cluster_rollup_type | tinyint(4) | NO | | NULL | |
| application_component_instance_id | smallint(5) unsigned | NO | | NULL | |
| application_id | smallint(5) unsigned | NO | | NULL | |
| group_count_val | mediumint(8) unsigned | NO | | NULL | |
| count_val | bigint(20) unsigned | NO | | NULL | |
| sum_val | bigint(20) | NO | | NULL | |
| min_val | bigint(20) | NO | | NULL | |
| max_val | bigint(20) | NO | | NULL | |
| cur_val | bigint(20) | NO | | NULL | |
| weight_value_square | double | NO | | NULL | |
| weight_value | double | NO | | NULL | |
| hr_of_day | int(11) | NO | | NULL | |
| hr_of_week | int(11) | NO | | NULL | |
| hr_of_month | int(11) | NO | | NULL | |
+-----------------------------------+-----------------------+------+-----+---------+———+
####
Please let me know your thoughts.
Thanks,
Claudiu
Hi Claudiu,
Apologies for delay in response, We could not locate the error "MySQLSyntaxErrorEx
a) Confirm do you have cold backup of /var/appdynamics/appdynamics-controller/db/data if yes how much long back you have taken the backup
b) what other tables are corrupted or raised similar errors?
c) Outout of below commands:
shell> cd /var/appdynamics/appdynamics-controller/bin/
shell> ./controller.sh login-db
mysql> show create table metricdata_hour_agg \G;
mysql> show tables \G;
d) Output of below commands:
shell> ls -ltr /var/appdynamics/appdynamics-controller/bin/metricdata_hour_agg
shell> ls -ltr /var/appdynamics/appdynamics-controller/bin/metricdata_*
We planeed to drop and recreate affected tables post stopping controller appserver java process and see how it goes with restart of services, post collecting above details.
On a side note, do not run controller in DEBUG/FINE logging level for long hours this is additional overhead on controller instance.
Hi Akumar,
Thanks for you response. Please find my answers below:
a) We have a test controller and it is not a major thing if we break it. DB is ~ 700 GB so it would take a while back-up it. But it is doable. As a side note I have run the upgrade on teh test controller firts and after on the live controller. The issue is reproducible on both instances. So i would assume we hit a bug.
b)For now m
c) please find the attached sqlqueryfile
d) please find the attached metricdata file
Thanks for your help.
Cheers,
Claudiu
Hi Claudiu,
- We would like clarify that if there are any db errors before upgrade than it will "not" automatically will get fixed by upgrade process, we need to have db with no errors before upgrade process.
Try below steps and see how it goes:
a) If possible as it is test controller , stop both contorller java and mysql and take backup
OR take backup of below all files related to affected table name metricdata_hour_agg, files that have table file names metricdata_hour_agg after stopping controller appserver java and mysql process (contorller.sh stop-appserver, stop-db)
b) Now run below steps:
shell> ./controller.sh stop-appserver
shell> ./controller.sh stop-db
shell> ./controller.sh start-db
shell> ./controller.sh login-db
mysql>
and issue attached sql at mysql prompt
c) and now issue below commands again
shell> ./controller.sh stop-db
shell> ./controller.sh start-db
shell> ./controller.sh start-appserver
Now see how it goes with error and controller ui accessibility.
Keep us posted if you see any issues
Regards,
Arun
Hi,
Thanks for you reply.
Applied the below changes and it seems that there is another table that is missing the PRIMARY key:
mysql> describe metricdata_hour;
+-----------------------------------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------------+----------------------+------+-----+---------+-------+
| ts_min | int(11) | NO | MUL | NULL | |
| metric_id | int(11) | NO | MUL | NULL | |
| rollup_type | tinyint(4) | NO | | NULL | |
| cluster_rollup_type | tinyint(4) | NO | | NULL | |
| node_id | int(11) | NO | | NULL | |
| application_component_instance_id | smallint(5) unsigned | NO | | NULL | |
| application_id | smallint(5) unsigned | NO | | NULL | |
| count_val | bigint(20) unsigned | NO | | NULL | |
| sum_val | bigint(20) | NO | | NULL | |
| min_val | bigint(20) | NO | | NULL | |
| max_val | bigint(20) | NO | | NULL | |
| cur_val | bigint(20) | NO | | NULL | |
| weight_value_square | double | NO | | NULL | |
| weight_value | double | NO | | NULL | |
| hr_of_day | int(11) | NO | | NULL | |
| hr_of_week | int(11) | NO | | NULL | |
| hr_of_month | int(11) | NO | | NULL | |
+-----------------------------------+----------------------+------+-----+---------+-------+
17 rows in set (0.01 sec)
The error related to this would be:
[#|2015-09-16T22:30:45.056+0000|SEVERE|glassfish3.1.2|com.appdynamics.METRICS.WRITE|_ThreadID=4596;_ThreadName=Thread-5;|failed doing rollup: metricdata_hour_agg :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key 'PRIMARY' doesn't exist in table 'MD'
at sun.reflect.GeneratedConstructorAccessor531.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2415)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2333)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2318)
at com.sun.gjc.spi.base.PreparedStatementWrapper.executeUpdate(PreparedStatementWrapper.java:125)
at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler.doRollupParallel(MetricDataRollupHandler.java:69)
at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler.access$000(MetricDataRollupHandler.java:31)
at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler$RollupTask.call(MetricDataRollupHandler.java:457)
at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler$RollupTask.call(MetricDataRollupHandler.java:435)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
|#]
I think we would need to recreate the table: metricdata_hour as well
Also I get the below error after restarting the controller:
[#|2015-09-17T06:58:27.933+0000|SEVERE|glassfish3.1.2|com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge|_ThreadID=1;_ThreadName=Thread-5;|TRUNCATE FAILED TO FINISH: ALTER TABLE metricdata_hour_agg TRUNCATE PARTITION PART
16330
java.sql.SQLException: Incorrect partition name
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1709)
at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1628)
at com.sun.gjc.spi.base.StatementWrapper.executeUpdate(StatementWrapper.java:117)
at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.alterPartition(PerfDataPartitionPurge.java:379)
at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsForShard(PerfDataPartitionPurge.java:346)
at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsDirectly(PerfDataPartitionPurge.java:243)
at com.singularity.ee.controller.beans.agent.db.MetricDataPurger.purgeMetricData(MetricDataPurger.java:353)
at com.singularity.ee.controller.beans.agent.MetricDataWriterBean.processMetricPurges(MetricDataWriterBean.java:952)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
to mitigate this one I am recreating the metricdata_hour_agg as well.
Thanks,
Claudiu
Hi Arun,
Any chance you can provide us with a fix for this?
Perhaps re-creating the metricdata_hour table would solve it, even if that means we'll loose the data in this table.
Other fix might save the date in this table - but any is welcomed now.
Thanks,
Claudiu
Hi Claudiu,
Please issue same steps for new table create script attached like we did earlier to recreate one and if issue persists on step wise in new logs send the list of tables , at this point we will loose old data, Just make sure we are not restarting controller in abnormal way by any chance which leads table corruption in future.
Hi Arun,
Thanks for your response.
For some reason now we get a different error - even though we haev not affected the metricdata_hour_agg_app table
[#|2015-10-01T15:26:00.876+0000|SEVERE|glassfish3.1.2|com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge|_ThreadID=1;_ThreadName=Thread-5;|PURGING FAILED TO FINISH: ALTER TABLE metricdata_hour_agg_app REORGANIZE PARTITION PARTMAX INTO ( PARTITION PART16720 VALUES LESS THAN (24076800), PARTITION PARTMAX VALUES LESS THAN MAXVALUE)
java.sql.SQLException: Duplicate partition name PART16720
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1709)
at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1628)
at com.sun.gjc.spi.base.StatementWrapper.executeUpdate(StatementWrapper.java:117)
at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.alterPartition(PerfDataPartitionPurge.java:443)
at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsForShard(PerfDataPartitionPurge.java:296)
at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsDirectly(PerfDataPartitionPurge.java:243)
at com.singularity.ee.controller.beans.agent.db.MetricDataPurger.purgeMetricData(MetricDataPurger.java:353)
at com.singularity.ee.controller.beans.agent.MetricDataWriterBean.processMetricPurges(MetricDataWriterBean.java:952)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1052)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1124)
at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:5388)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:619)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:800)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:571)
at com.singularity.ee.controller.beans.EntityManagerInterceptor.manageEntityManagerLifecycle(EntityManagerInterceptor.java:53)
at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)
Controller fails to start due to this error.
Any thoughts on this?
Thanks,
Claudiu
Hi,
Small correction on my previous post. Controller starts now and the 3 hour metric are working fine.
The only thing is the partition error we gent during the controller restart.
Thanks,
CLaudiu