Disclaimer: This is in no way supported and will break Splunk support. Don't use this in a production environment. There are better ways to solve this, I'm just not smart enough to figure them out. This hack breaks at the next update - allways, so extra care should be taken. think this would be very nice if Splunk could support the desired behavior, maybee as an option or configuration. There are three places that influences the maintenance window calculation: service_health_metrics_monitor: Original: | mstats latest(alert_level) AS alert_level WHERE `get_itsi_summary_metrics_index` AND
`service_level_max_severity_metric_only` by itsi_kpi_id, itsi_service_id, kpi, kpi_importance
| lookup kpi_alert_info_lookup alert_level OUTPUT severity_label AS alert_name | `mark_services_in_maintenance`
| `reorganize_metrics_healthscore_results` | gethealth | `get_info_time_without_sid`
| lookup service_kpi_lookup _key AS itsi_service_id OUTPUT sec_grp AS itsi_team_id
| search itsi_team_id=*
| fields - alert_severity, color, kpi, kpiid, serviceid, severity_label, severity_value
| rename health_score AS service_health_score | eval is_null_alert_value=if(service_health_score="N/A", 1, 0),
service_health_score=if(service_health_score="N/A", 0, service_health_score) This could be changed to: Modified: | mstats latest(alert_level) AS alert_level WHERE `get_itsi_summary_metrics_index` AND
`service_level_max_severity_metric_only` by itsi_kpi_id, itsi_service_id, kpi, kpi_importance
| lookup kpi_alert_info_lookup alert_level OUTPUT severity_label AS alert_name | `mark_services_in_maintenance`
| `reorganize_metrics_healthscore_results` | gethealth | `get_info_time_without_sid`
| lookup service_kpi_lookup _key AS itsi_service_id OUTPUT sec_grp AS itsi_team_id
| fields - alert_severity, color, kpi, kpiid, serviceid, severity_label, severity_value
| rename health_score AS service_health_score | `mark_services_in_maintenance` | eval is_null_alert_value=if(service_health_score="N/A", 1, 0),
service_health_score=if(service_health_score="N/A", 0, service_health_score), alert_level=if(is_service_in_maintenance=1 AND alert_level>-2,-2,alert_level) I have added an extra call to macro "mark_services_in_maintenance" and expanded the last eval to set alert_level to maintenance. service_health_monitor: Original: `get_itsi_summary_index` host=atp-00pshs* `service_level_max_severity_event_only`
| stats latest(urgency) AS urgency latest(alert_level) AS alert_level latest(alert_severity) as alert_name latest(service) AS service latest(is_service_in_maintenance) AS is_service_in_maintenance latest(kpi) AS kpi by kpiid, serviceid
| lookup service_kpi_lookup _key AS serviceid OUTPUT sec_grp AS itsi_team_id
| search itsi_team_id=*
| gethealth
| `gettime` Could be changed to: Modified: `get_itsi_summary_index` `service_level_max_severity_event_only`
| stats latest(urgency) AS urgency latest(alert_level) AS alert_level latest(alert_severity) as alert_name latest(service) AS service latest(is_service_in_maintenance) AS is_service_in_maintenance latest(kpi) AS kpi by kpiid, serviceid
| gethealth
| `gettime`
| `mark_services_in_maintenance`
| eval alert_level=if(is_service_in_maintenance=1 AND alert_level>-2,-2,alert_level), color=if(is_service_in_maintenance=1 AND alert_level=-2,"#5C6773",color), severity_label=if(is_service_in_maintenance=1 AND alert_level=-2,"maintenance",severity_label), alert_severity=if(is_service_in_maintenance=1 AND alert_level=-2,"maintenance",alert_severity) Again an extra call to macro "mark_services_in_maintenance" and the eval at the bottom to set the service in maintenance. Those to will ensure the service appears in maintenance in the "Service Anayser" and "Glasstables" I think it also takes care of "Deep Dives" but they don't appear to turn dark grey. In order to ensure correct calculation we also have to make changes in "gethealth" search command. The Python script that is of interest is located here: "SPLUNK_HOME/etc/apps/SA-ITOA/lib/itsi/searches/compute_health_score.py" Search for "If a dependent service is disabled, its health should not affect other services" You should get som code that look like this: for depends_on in service.get('services_depends_on', []):
# If a dependent service is disabled, its health should not affect other services
dependent_service_id = depends_on.get('serviceid')
dependency_enabled = [
svc.get('enabled', 1) for svc in self.all_services if dependent_service_id == svc.get('_key')
]
if len(dependency_enabled) == 1 and dependency_enabled[0] == 0:
continue
for kpi in depends_on.get('kpis_depending_on', []):
# Get urgencies for dependent services What I want is to replicate the behavior from a disbled service for depends_on in service.get('services_depends_on', []):
# If a dependent service is disabled, its health should not affect other services
dependent_service_id = depends_on.get('serviceid')
dependency_enabled = [
svc.get('enabled', 1) for svc in self.all_services if dependent_service_id == svc.get('_key')
]
if len(dependency_enabled) == 1 and dependency_enabled[0] == 0:
continue
# If a dependent service is in maintenance, its health should not affect other services - ATP
maintenance_service_id = depends_on.get('serviceid')
try:
isinstance(self.maintenance_services, list)
except:
self.maintenance_services = None
if self._is_service_currently_in_maintenance(maintenance_service_id):
self.logger.info('ATP - is service in maintenance %s', self._is_service_currently_in_maintenance(maintenance_service_id))
continue
for kpi in depends_on.get('kpis_depending_on', []):
# Get urgencies for dependent services So I added a call to an existing function _is_service_currently_in_maintenance, unfortunately this fails, as the table maintenance_service is un-initialized (that is the try: except: block), now it just a simple check if the service we depend on is in maintenance and if it is, we break out with continue. Again, this is NOT supported in any way and should not be used in production and will break at the next update. Kind regards
... View more