We are running Splunk Enterprise 6.5.3.
On Heavy forwarder we installed and configured Splunk add-on for Microsoft Cloudservices (current version 2.0.3) in summer of the previous year and have been using it since.
Yesterday we stopped receiving any data in Splunk for that add-on.
There are few errors: & warnings in Splunk internal index (sample errors to follow).
We also ran Linux patching on that heavy forwarder server ,that required reboot, right before data stopped coming.
Any advices on how to approach this issue and possibly fix it will be appreciated.
Here are patterns of errors and warnings :
Log_level=ERROR, pid=13456, tid=MainThread, file=config.py, func_name=log, code_line_no=51 | UCC Config Module: Fail to load value of "json" - endpoint=account_list, item=O365prod, field=refresh_token File "/export/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/ms_o365_account_monitoring.py", line 286,.....
Log_level=WARNING, pid=13456, tid=MainThread, file=config.py, func_name=log, code_line_no=51 | UCC Config Module: Fail to load value of "json" - endpoint=account_list, item=O365prod, field=refresh_token - No JSON object could be decoded File "/export/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/ms_o365_account_monitoring.py", line 286, in main() File "/export/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/ms_o365_account_monitoring.py", line 278..
Log_level=WARNING, pid=13456, tid=MainThread, file=config.py, func_name=log, code_line_no=51 | UCC Config Module: Fail to load value of "json" - endpoint=account_list, item=O365prod, field=refresh_token - No JSON object could be decoded File "/export/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/ms_o365_account_monitoring.py", line 286, in main() File "/export/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/ms_o365_account_monitoring.py", line 278.....MalformedHeader: WWW-Authenticate
Log_level=ERROR, pid=13456, tid=MainThread, file=ms_o365_account_monitoring.py, func_name=run, code_line_no=146 | Failed to load conf files, reason: .......ConfigException: Fail to load value of "json" - endpoint=account_list, item=O365prod, field=refresh_token....
I personally do not prefer any of the "Apps" or "TA"s for Microsoft Azure or O365 written by either Splunk or Microsoft.
Here is a mechanism I created that has not broken since I implemented it: https://answers.splunk.com/answers/678660/how-to-get-logs-from-azure-and-o365-into-splunk.html
I'm having the same issue and I think it has something to do with settings not getting copied properly.
Take a look at this set of messages that keeps occurring over and over again:
2018-01-11 16:42:21,667 +0000 log_level=INFO, pid=15095, tid=MainThread, file=file_monitor.py, func_name=check_changes, code_line_no=48 | Detect /opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/local/splunk_ta_ms_o365_server_accounts.conf has changed
2018-01-11 16:42:31,345 +0000 log_level=INFO, pid=15095, tid=Thread-4, file=dispatch_engine.py, func_name=_deploy_global_setting, code_line_no=612 | message="Deploy global setting:account_list$$f69356c0-5d8c-41bd-ab4e-2e575c1baff0_SplunkInt to forwarder:localhost success"
2018-01-11 16:42:40,356 +0000 log_level=INFO, pid=15095, tid=MainThread, file=file_monitor.py, func_name=check_changes, code_line_no=48 | Detect /opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/local/splunk_ta_ms_o365_server_accounts.conf has changed
2018-01-11 16:42:49,987 +0000 log_level=INFO, pid=15095, tid=Thread-5, file=dispatch_engine.py, func_name=_deploy_global_setting, code_line_no=612 | message="Deploy global setting:account_list$$f69356c0-5d8c-41bd-ab4e-2e575c1baff0_SplunkInt to forwarder:localhost success"
2018-01-11 16:42:53,992 +0000 log_level=INFO, pid=15095, tid=MainThread, file=file_monitor.py, func_name=check_changes, code_line_no=48 | Detect /opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/local/splunk_ta_ms_o365_server_accounts.conf has changed
2018-01-11 16:43:03,566 +0000 log_level=INFO, pid=15095, tid=Thread-2, file=dispatch_engine.py, func_name=_deploy_global_setting, code_line_no=612 | message="Deploy global setting:account_list$$f69356c0-5d8c-41bd-ab4e-2e575c1baff0_SplunkInt to forwarder:localhost success"
2018-01-11 16:43:12,578 +0000 log_level=INFO, pid=15095, tid=MainThread, file=file_monitor.py, func_name=check_changes, code_line_no=48 | Detect /opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/local/splunk_ta_ms_o365_server_accounts.conf has changed
2018-01-11 16:43:22,170 +0000 log_level=INFO, pid=15095, tid=Thread-3, file=dispatch_engine.py, func_name=_deploy_global_setting, code_line_no=612 | message="Deploy global setting:account_list$$f69356c0-5d8c-41bd-ab4e-2e575c1baff0_SplunkInt to forwarder:localhost success"
2018-01-11 16:43:31,180 +0000 log_level=INFO, pid=15095, tid=MainThread, file=file_monitor.py, func_name=check_changes, code_line_no=48 | Detect /opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/local/splunk_ta_ms_o365_server_accounts.conf has changed
@raugugliaro
1) so you also stopped receiving data to Splunk add-on for MS CS? When did it start? For us: 1/7/2018 around 4-5-6-7pm Eastern time. Just trying to see if it might have been caused by some changes on Microsoft site in case we started getting the issue at the same time.
2)
The message:
2018-01-02 04:15:08,577 +0000 log_level=INFO, pid=5217, tid=Thread-1, file=file_monitor.py, func_name=check_changes, code_line_no=48 | Detect /export/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/local/splunk_ta_ms_o365_server_accounts.conf has changed
I've checked if we had the similar messages before the data stopped coming . It seems that we had this kind of messages before as well. For example , On Jan 1st. 2018 we had data coming. And message existed. So , I assume, it's a standard message.
so ours stopped working around 10/30/2018
we get these messages from the splunk_ta_microsoft-cloudservices_account_monitoring.log
2018-12-06 19:35:22,049 +0000 log_level=INFO, pid=123035, tid=MainThread, file=o365_refresh_token.py, func_name=get_updated_datas, code_line_no=557 | No account for account splunk_prod_o365 needs to be update by client_credentials
2018-12-06 19:35:22,049 +0000 log_level=INFO, pid=123035, tid=MainThread, file=o365_refresh_token.py, func_name=get_updated_datas, code_line_no=557 | No account for account splunk_prod_o365 needs to be update by refresh_token
The error logs seems to indicate a problem with parsing of the UCC Config JSON file at endpoint "account_list"
Going through the add-on code, it seems to come from a problem with parsing of the "o365_schema.account_monitor_config.json" file under /bin/splunktamscs/o365_schema.account_monitor_config.json
More specifically, with the account_list section and refresh_token value.
You can try looking for a missing coma or missing quotes around "json" for example.
The default content for that config file is (fresh download of version 2.0.3):
{
"_product": "Splunk_TA_microsoft-office365",
"_rest_namespace": "splunk_ta_ms_o365",
"_rest_prefix": "ta_o365_server_",
"_protocol_version": "1.0",
"_version": "1.0.0.0",
"cert_setting": {
"endpoint": "certificate"
},
"api_setting": {
"endpoint": "#configs/conf-splunk_ta_ms_o365_api_settings",
"field_types": {
"*": {
"api_url": "json",
"data": "json"
}
}
},
"ucc_system_setting": {
"endpoint": "#configs/conf-splunk_ta_ms_o365_server_ucc_system_setting",
"field_types": {
"o365_refresh_token": {
"apis": "json",
"url": "json"
}
}
},
"global_setting": {
"endpoint": "settings",
"field_types": {
"proxy": {
"enable": "bool",
"dns_passthrough": "bool"
}
}
},
"account_list": {
"endpoint": "accounts",
"field_types": {
"*": {
"access_tokens": "json",
"access_tokens_encrypted": "json",
"refresh_token": "json"
}
}
},
"management_api_input_list": {
"endpoint": "management_api_inputs"
}
}
Hopefully that will help if not, can you try provide more errors/warnings if any.
@damien_chillet Thank you for your reply!
The add-on was created by Splunk and it worked until Sunday, 1/7/2018. So don't think that it is the parsing.
Unless Microsoft changed something on their side?
We found the following error starting around the time we stopped receiving data:
SSLHandshakeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:676)
The error "SSLHandshakeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed" might be a problem with how Python is doing SSL verifications on your machine. Have you recently updated your Python installation or changed your SSL Certificate Store?
@raugugliaro , you know the problem corrected itself, we haven't done any changes or anything. Makes me thing that it was something on Microsoft side