AppD Archive

Controller does not show the 1 hour granularity metrics for tiers

CommunityUser
Splunk Employee
Splunk Employee

Hi All,

I have this strange behaviour and was wondering if anyone has seen this before. I'll explain below the problem.

SInce I upgraded the controller to to version 4.0.7.0 build 40 it seems that the metrics that have the 1 hour granularity (ex: Last 3 days) are no longer displayed at tier level:

1. Take the tier1 as example (this tier has 2 nodes underneath) and display the calls/min graph for the last 15 mins - all looks good

2. Change the time range to Last 3 days - Metrics are no longer displayed - empty graph

3. The thing is that if I display the same graph for the same time range (Last 3 days) for the nodes underneath tier1, the graph is properly displayed - so the issue seem to be related to the aggregated metrics for tier

For now I have run a mysql check just to make sure the tables are not corrupted - all looks good.

Have also set the controller logs on FINE but no errors are logged while I display the Last 3 days graph.

Any help is appreciated.

Thanks,

Claudiu

0 Karma

Arun_Dasetty
Super Champion

Hi Claudiu,

Can you quickly try step wise restart of contorller appserver and mysql db (stopController.bat/sh and then issue startController.bat/sh from controller_dir/bin path) and login UI and replicate the issue and send the screenshots from UI and also database.logs and 3-4 recent server.log under <Contorller_dir>/logs path for initial review on this

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi Akumar,

As requested I have attached the logs and screenshot.

Investihgated further I have found the below:

Here is the query to get the data that is not properly displayed:

###########

SELECT TS_MIN, METRIC_ID, sum_val,count_val,min_val,max_val,cur_val,   time_rollup_type, M.cluster_rollup_type, M.name, MD.application_component_instance_id, group_count_val FROM metricdata_hour_agg MD  FORCE INDEX(hr_of_day)  STRAIGHT_JOIN metric M WHERE ( (metric_id IN (1506378) AND application_component_instance_id = 296) ) AND    ts_min IN (24015180,24015240,24015300,24015360,24015420,24015480,24015540,24015600,24015660,24015720,24015780,24015840,24015900,24015960,24016020,24016080,24016140,24016200,24016260,24016320,24016380,24016440,24016500,24016560,24016620,24016680,24016740,24016800,24016860,24016920,24016980,24017040,24017100,24017160,24017220,24017280,24017340,24017400,24017460,24017520,24017580,24017640,24017700,24017760,24017820,24017880,24017940,24018000,24018060,24018120,24018180,24018240,24018300,24018360,24018420,24018480,2401850,24018600,24018660,24018720,24018780,24018840,24018900,24018960,24019020,24019080,24019140,24019200,24019260,24019320,24019380,24019440 )  AND MD.metric_id = M.id  ORDER BY TS_MIN

############

And could also find the query that should populate the metricdata_hour_agg, which is below. Also I can see that all partitions for this table, since the upgrade, they have the same dimension which points out that data is not being inserted. This is backed up by an error which I’ll post below.

###insert statement###

INSERT IGNORE INTO metricdata_hour_agg (ts_min, metric_id, rollup_type, cluster_rollup_type, application_component_instance_id, application_id,   group_count_val, sum_val, count_val, min_val, max_val, cur_val, weight_value_square, weight_value, hr_of_day, hr_of_week, hr_of_month) SELECT  MD.ts_min, MD.metric_id, MAX(MD.rollup_type) rollup_type,  MAX(MD.cluster_rollup_type) cluster_rollup_type,    MD.application_component_instance_id, MAX(MD.application_id), COUNT(*) * 100, SUM(MD.sum_val),   SUM(MD.count_val),    (CASE MD.rollup_type WHEN 0 THEN MIN(MD.min_val) WHEN 1 THEN MIN(MD.sum_val) WHEN 2 THEN MIN(MD.cur_val) ELSE 0 END),   (CASE MD.rollup_type WHEN 0 THEN MAX(MD.max_val) WHEN 1 THEN MAX(MD.sum_val) WHEN 2 THEN MAX(MD.cur_val) ELSE 0 END),   SUM(MD.cur_val), SUM(MD.weight_value_square), SUM(MD.weight_value),   MAX(MD.hr_of_day), MAX(MD.hr_of_week), MAX(MD.hr_of_month) FROM metricdata_hour MD USE INDEX (PRIMARY) WHERE ((ts_min DIV 60) % <MOD_ID>) = <MOD_POS> AND MD.ts_min BETWEEN 24019441 AND 24019559 GROUP BY MD.ts_min, MD.metric_id, MD.application_component_instance_id ORDER BY MD.ts_min, MD.metric_id, MD.application_component_instance_id

##################



Error found in application logs corresponding to the insert statement:

####
[#|2015-09-02T08:18:50.606+0000|SEVERE|glassfish3.1.2|com.appdynamics.METRICS.WRITE|_ThreadID=6407;_ThreadName=Thread-5;|failed doing  rollup: metricdata_hour_agg :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key 'PRIMARY' doesn't exist in table 'MD'
               at sun.reflect.GeneratedConstructorAccessor540.newInstance(Unknown Source)
               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
               at java.lang.reflect.Constructor.newInstance(Unknown Source)
               at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
               at com.mysql.jdbc.Util.getInstance(Util.java:386)
               at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
               at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
####

Looking at metricdata_hour_agg table it does not have primary keys set but all other similar tables (like metricdata_min_agg) have primary keys.

###metricdata_hour_agg###
+-----------------------------------+-----------------------+------+-----+---------+-------+
| Field                             | Type                  | Null | Key | Default | Extra |
+-----------------------------------+-----------------------+------+-----+---------+-------+
| ts_min                            | int(11)               | NO   | MUL | NULL    |       |
| metric_id                         | int(11)               | NO   | MUL | NULL    |       |
| rollup_type                       | tinyint(4)            | NO   |     | NULL    |       |
| cluster_rollup_type               | tinyint(4)            | NO   |     | NULL    |       |
| application_component_instance_id | smallint(5) unsigned  | NO   |     | NULL    |       |
| application_id                    | smallint(5) unsigned  | NO   |     | NULL    |       |
| group_count_val                   | mediumint(8) unsigned | NO   |     | NULL    |       |
| count_val                         | bigint(20) unsigned   | NO   |     | NULL    |       |
| sum_val                           | bigint(20)            | NO   |     | NULL    |       |
| min_val                           | bigint(20)            | NO   |     | NULL    |       |
| max_val                           | bigint(20)            | NO   |     | NULL    |       |
| cur_val                           | bigint(20)            | NO   |     | NULL    |       |
| weight_value_square               | double                | NO   |     | NULL    |       |
| weight_value                      | double                | NO   |     | NULL    |       |
| hr_of_day                         | int(11)               | NO   |     | NULL    |       |
| hr_of_week                        | int(11)               | NO   |     | NULL    |       |
| hr_of_month                       | int(11)               | NO   |     | NULL    |       |
+-----------------------------------+-----------------------+------+-----+---------+———+
####

Please let me know your thoughts.

Thanks,

Claudiu

0 Karma

Arun_Dasetty
Super Champion

Hi Claudiu,

Apologies for delay in response, We could not locate the error "MySQLSyntaxErrorException" in logs provided check multiple times, but we could confirm that primary key column error should not be raising with successful installation it looks somehow db is either corrupted or some bug could be hitting. Can you please provide below details:

a) Confirm do you have cold backup of /var/appdynamics/appdynamics-controller/db/data if yes how much long back you have taken the backup

b) what other tables are corrupted or raised similar errors?

c) Outout of below commands:

shell> cd  /var/appdynamics/appdynamics-controller/bin/

shell> ./controller.sh login-db 

mysql> show create table metricdata_hour_agg \G;

mysql> show tables \G;

d) Output of below commands:

shell> ls -ltr /var/appdynamics/appdynamics-controller/bin/metricdata_hour_agg

shell> ls -ltr /var/appdynamics/appdynamics-controller/bin/metricdata_*

We planeed to drop and recreate affected tables post stopping controller appserver java process and see how it goes with restart of services, post collecting above details.

On a side note, do not run controller in DEBUG/FINE logging level for long hours this is additional overhead on controller instance.

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi Akumar,

Thanks for you response. Please find my answers below:

a) We have a test controller and it is not a major thing if we break it. DB is ~ 700 GB so it would take a while back-up it. But it is doable. As a side note I have run the upgrade on teh test controller firts and after on the live controller. The issue is reproducible on both instances. So i would assume we hit a bug.

b)For now metricdata_hour_agg is the only one that raised problems.

c) please find the attached sqlqueryfile

d) please find the attached metricdata file

Thanks for your help.

Cheers,

Claudiu

0 Karma

Arun_Dasetty
Super Champion

Hi Claudiu,

- We would like clarify that if there are any db errors before upgrade than it will "not" automatically will get fixed by upgrade process, we need to have db with no errors before upgrade process.

Try below steps and see how it goes:

a) If possible as it is test controller , stop both contorller java and mysql and take backup

OR take backup of below all files related to affected table name metricdata_hour_agg, files that have table file names metricdata_hour_agg after stopping controller appserver java and mysql process (contorller.sh stop-appserver, stop-db)

b) Now run below steps:

shell> ./controller.sh stop-appserver

shell> ./controller.sh stop-db

shell> ./controller.sh start-db

shell> ./controller.sh login-db

mysql> 

and issue attached sql at mysql prompt

c) and now issue below commands again

shell> ./controller.sh stop-db

shell> ./controller.sh start-db

shell> ./controller.sh start-appserver

Now see how it goes with error and controller ui accessibility.

Keep us posted if you see any issues 

Regards,

Arun

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi,

Thanks for you reply.

Applied the below changes and it seems that there is another table that is missing the PRIMARY key:

mysql> describe metricdata_hour;
+-----------------------------------+----------------------+------+-----+---------+-------+
| Field                             | Type                 | Null | Key | Default | Extra |
+-----------------------------------+----------------------+------+-----+---------+-------+
| ts_min                            | int(11)              | NO   | MUL | NULL    |       |
| metric_id                         | int(11)              | NO   | MUL | NULL    |       |
| rollup_type                       | tinyint(4)           | NO   |     | NULL    |       |
| cluster_rollup_type               | tinyint(4)           | NO   |     | NULL    |       |
| node_id                           | int(11)              | NO   |     | NULL    |       |
| application_component_instance_id | smallint(5) unsigned | NO   |     | NULL    |       |
| application_id                    | smallint(5) unsigned | NO   |     | NULL    |       |
| count_val                         | bigint(20) unsigned  | NO   |     | NULL    |       |
| sum_val                           | bigint(20)           | NO   |     | NULL    |       |
| min_val                           | bigint(20)           | NO   |     | NULL    |       |
| max_val                           | bigint(20)           | NO   |     | NULL    |       |
| cur_val                           | bigint(20)           | NO   |     | NULL    |       |
| weight_value_square               | double               | NO   |     | NULL    |       |
| weight_value                      | double               | NO   |     | NULL    |       |
| hr_of_day                         | int(11)              | NO   |     | NULL    |       |
| hr_of_week                        | int(11)              | NO   |     | NULL    |       |
| hr_of_month                       | int(11)              | NO   |     | NULL    |       |
+-----------------------------------+----------------------+------+-----+---------+-------+
17 rows in set (0.01 sec)

The error related to this would be:

[#|2015-09-16T22:30:45.056+0000|SEVERE|glassfish3.1.2|com.appdynamics.METRICS.WRITE|_ThreadID=4596;_ThreadName=Thread-5;|failed doing  rollup: metricdata_hour_agg :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key 'PRIMARY' doesn't exist in table 'MD'
        at sun.reflect.GeneratedConstructorAccessor531.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
        at com.mysql.jdbc.Util.getInstance(Util.java:386)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2415)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2333)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2318)
        at com.sun.gjc.spi.base.PreparedStatementWrapper.executeUpdate(PreparedStatementWrapper.java:125)
        at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler.doRollupParallel(MetricDataRollupHandler.java:69)
        at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler.access$000(MetricDataRollupHandler.java:31)
        at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler$RollupTask.call(MetricDataRollupHandler.java:457)
        at com.singularity.ee.controller.beans.agent.db.MetricDataRollupHandler$RollupTask.call(MetricDataRollupHandler.java:435)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
|#]

I think we would need to recreate the table: metricdata_hour as well

Also I get the below error after restarting the controller:

[#|2015-09-17T06:58:27.933+0000|SEVERE|glassfish3.1.2|com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge|_ThreadID=1;_ThreadName=Thread-5;|TRUNCATE FAILED TO FINISH: ALTER TABLE metricdata_hour_agg TRUNCATE PARTITION PART
16330
java.sql.SQLException: Incorrect partition name
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
        at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1709)
        at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1628)
        at com.sun.gjc.spi.base.StatementWrapper.executeUpdate(StatementWrapper.java:117)
        at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.alterPartition(PerfDataPartitionPurge.java:379)
        at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsForShard(PerfDataPartitionPurge.java:346)
        at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsDirectly(PerfDataPartitionPurge.java:243)
        at com.singularity.ee.controller.beans.agent.db.MetricDataPurger.purgeMetricData(MetricDataPurger.java:353)
        at com.singularity.ee.controller.beans.agent.MetricDataWriterBean.processMetricPurges(MetricDataWriterBean.java:952)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

 to mitigate this one I am recreating the metricdata_hour_agg as well.

Thanks,

Claudiu

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi Arun,

Any chance you can provide us with a fix for this?

Perhaps re-creating the  metricdata_hour table would solve it, even if that means we'll loose the data in this table.

Other fix might save the date in this table - but any is welcomed now.

Thanks,

Claudiu

0 Karma

Arun_Dasetty
Super Champion

Hi Claudiu,

Please issue same steps for new table create script attached like we did earlier to recreate one and if issue persists on step wise in new logs send the list of tables , at this point we will loose old data, Just make sure we are not restarting controller in abnormal way by any chance which leads table corruption in future.

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi Arun,

Thanks for your response.

For some reason now we get a different error - even though we haev not affected the metricdata_hour_agg_app table

[#|2015-10-01T15:26:00.876+0000|SEVERE|glassfish3.1.2|com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge|_ThreadID=1;_ThreadName=Thread-5;|PURGING FAILED TO FINISH: ALTER TABLE metricdata_hour_agg_app REORGANIZE PARTITION PARTMAX INTO (   PARTITION PART16720 VALUES LESS THAN (24076800),   PARTITION PARTMAX VALUES LESS THAN MAXVALUE)
java.sql.SQLException: Duplicate partition name PART16720
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
    at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1709)
    at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1628)
    at com.sun.gjc.spi.base.StatementWrapper.executeUpdate(StatementWrapper.java:117)
    at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.alterPartition(PerfDataPartitionPurge.java:443)
    at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsForShard(PerfDataPartitionPurge.java:296)
    at com.singularity.ee.controller.beans.agent.db.PerfDataPartitionPurge.purgePartitionsDirectly(PerfDataPartitionPurge.java:243)
    at com.singularity.ee.controller.beans.agent.db.MetricDataPurger.purgeMetricData(MetricDataPurger.java:353)
    at com.singularity.ee.controller.beans.agent.MetricDataWriterBean.processMetricPurges(MetricDataWriterBean.java:952)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1052)
    at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:1124)
    at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:5388)
    at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:619)
    at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:800)
    at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:571)
    at com.singularity.ee.controller.beans.EntityManagerInterceptor.manageEntityManagerLifecycle(EntityManagerInterceptor.java:53)
    at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)

Controller fails to start due to this error.

Any thoughts on this?

Thanks,

Claudiu

0 Karma

CommunityUser
Splunk Employee
Splunk Employee

Hi,

Small correction on my previous post. Controller starts now and the 3 hour metric are working fine.

The only thing is the partition error we gent during the controller restart.

Thanks,

CLaudiu

0 Karma

Arun_Dasetty
Super Champion

Hi Claudiu,

Glad to hear that news from you finally, For new error lets recreate the table and hope this resolves the recent error as well(attached querry recreate table script, truncate and drop and create one )

0 Karma
Get Updates on the Splunk Community!

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to ...

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to Officially Supported Splunk ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI! Discover how Splunk’s agentic AI ...