We have been bringing our Splunk 8.2.2.1 Enterprise stand-alone server up with SAML SSO using our windows.net connector for AD integration.
This is working on our test machine without issue. (We had issues getting it 100%, but 'tis the way with SAML and SSO). We have attempted to replicate it to our production server, and keep getting the following error, which is not helpful:
02-08-2022 17:40:19.233 -0500 INFO loader [9134 MainThread] - SAML cert db registration with KVStore successful
02-08-2022 17:39:01.352 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder destructor for domain: samlv2, authType: SAML, refCnt: 2
02-08-2022 17:39:01.352 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder destructor for domain: samlv2, authType: SAML, refCnt: 3
02-08-2022 17:39:01.352 -0500 ERROR UserManagerPro [7659 SchedulerThread] - SAML config is invalid, Reconfigure it.
02-08-2022 17:39:01.352 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder destructor for domain: samlv2, authType: SAML, refCnt: 4
02-08-2022 17:39:01.352 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder constructor for domain: samlv2, authType: SAML, refCnt: 4
02-08-2022 17:39:01.352 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder constructor for domain: samlv2, authType: SAML, refCnt: 3
02-08-2022 17:39:01.352 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder constructor for domain: samlv2, authType: SAML, refCnt: 2
02-08-2022 17:39:01.191 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder destructor for domain: samlv2, authType: SAML, refCnt: 2
02-08-2022 17:39:01.191 -0500 DEBUG AuthenticationManager [7659 SchedulerThread] - AuthProviderHolder destructor for domain: samlv2, authType: SAML, refCnt: 3
02-08-2022 17:39:01.191 -0500 ERROR UserManagerPro [7659 SchedulerThread] - SAML config is invalid, Reconfigure it.
We have enabled debug logging on the user manager and authentication threads, but this isn't adding any more detail.
authentication.conf looks like this: (altered just the domains, names, GUIDs, etc.)
[authentication]
#authSettings = samlv2
#authType = SAML
authSettings = splunk
authType = splunk
[authenticationResponseAttrMap_SAML]
mail = http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress
realName = http://schemas.microsoft.com/identity/claims/displayname
role = http://schemas.microsoft.com/ws/2008/06/identity/claims/groups
[samlv2]
entityId = splunkEntityProdId
fqdn = https://mydomain.com
idpCertPath = idpCert.pem
idpSLOUrl = https://login.microsoftonline.com/8a4925a9-fd8e-4866-b31c-f/saml2
idpSSOUrl = https://login.microsoftonline.com/8a4925a9-fd8e-4866-b31c-f/saml2
inboundDigestMethod = SHA1;SHA256;SHA384;SHA512
inboundSignatureAlgorithm = RSA-SHA1;RSA-SHA256;RSA-SHA384;RSA-SHA512
issuerId = https://sts.windows.net/8a4925a9-fd8e-4866-b31c-f/
lockRoleToFullDN = true
nameIdFormat = urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress
redirectPort = 8000
replicateCertificates = true
signAuthnRequest = false
signatureAlgorithm = RSA-SHA1
signedAssertion = true
sloBinding = HTTP-POST
ssoBinding = HTTP-POST
allowPartialSignatures = true
[roleMap_SAML]
admin = b5ba2c9d-6b90-4e48-8746-16d52
GSP-Splunk-Prod-Admin = b5ba2c9d-6b90-4e48-8746-16d52
GSP-Splunk-Prod-Other = 5c89568d-d73f-4022-92b2-f9768
GSP-Splunk-Prod-PowerUser = 6f21a008-d90c-434c-aa48-7ae08
GSP-Splunk-Prod-User = 30a721f4-0281-410f-8e3b-7f9c
power = 6f21a008-d90c-434c-aa48-7ae08
user = 30a721f4-0281-410f-8e3b-7f9cc7
[userToRoleMap_SAML]
johndoe@none.com = admin;GSP-Splunk-Prod-Admin::John Doe::johndoe@none.com
[splunk_auth]
constantLoginTime = 0.000
enablePasswordHistory = 1
expireAlertDays = 15
expirePasswordDays = 90
expireUserAccounts = 1
forceWeakPasswordChange = 0
lockoutAttempts = 7
lockoutMins = 30
lockoutThresholdMins = 5
lockoutUsers = 1
minPasswordDigit = 0
minPasswordLength = 8
minPasswordLowercase = 0
minPasswordSpecial = 0
minPasswordUppercase = 0
passwordHistoryCount = 3
verboseLoginFailMsg = 1
Fixed this. Issue was the idp certificate was not available or found. Once swapped in for our prodtest server, the error changed, stating the certificate was invalid. We then regenerated the certificate for this server, replaced it, and everything started working as expected after a restart.