Splunk Enterprise Security

Run Adaptive Response & Azure SAML's Lack of AQR

Pcktech
Loves-to-Learn Lots

Issue

When configured to use Azure SAML on our Enterprise Security search head (no Authentication Extension yet specified) I discovered that Enterprise Security 6.4.0's Incident Review's "Run Adaptive Response" returned "Unexpected token < in JSON at position 0" when attempting to run any response (even Ping) with no data passed to the response. It was an immediate failure. Support noted a HAR showed it was because credentials weren't being passed, and pointed to a lack of AQR Support by Azure as the reason.

Backstory

While surprising Enterprise Security had a single feature (so far discovered) relying on AQR, the lack of AQR by Azure was not a surprise as I'd been exposed to that when attempting to setup the Secure Gateway as well (which we gave up on as a secondary priority to finishing our installation). That same exposure also led to us discovering in the Secure Gateway documentation that there was a sample script to be used as a SAML Authentication Extension to overcome this lack of Azure support. Unfortunately, at the time, the script didn't seem to work -- after actually looking at its code I could tell why: the Splunk provided sample expected an Azure API Key.

Solution

WARNING for Production Environments: If you attempt to use the Authentication Extension script be advised that so long as it is enabled and not working your Web Session will timeout after the User Time To Live period regardless of activity because it cannot re-validate your identity (e.g. 3600s by default -- 1 Hour). When it times out your cookie may be well and truly hosed and you'll need to clear cookies & cache to get back to the login page. Worst case scenario, you'll need to edit $SPLUNK_HOME/etc/system/local/authentication.conf manually to comment out or remove getUserInfoTtl, scriptFunctions,scriptPath,scriptSecureArguments,scriptTimeout, then use $SPLUNK_HOME/bin/splunk restart to get back to the login page.

Additional Warning: Splunk Support does not support any of the following (won't even try, they'll direct you to your Account Team) so if anything happens you can curse my name, but I take no responsibility etc. etc., but this is the only way anyone (including Splunk's own documentation, see above) has mentioned how to deal with Azure's lack of AQR support.

Azure Prerequisites:

  1. If you have not already, go to portal.azure.com and under Azure Active Directory > App Registrations  create an App for your Splunk instance. Please see Splunk's documentation regarding setting up SAML -- but be sure to download the certificate and XML file from Azure! That XML file can be uploaded into Splunk's SAML Configuration page to auto-populate almost everything.
  2. For those without an Azure API key, we'll use the Client Secret method... In portal.azure.com where SAML was configured for Splunk (Azure Active Directory > App Registrations > All Applications > search for your app name here):
    1. Ask your Azure Admin to create a Client Secret under "Certificates & Secrets"
    2. Ask your Azure Admin to then add "Microsoft Graph" APIs User.Read.All (vital), Group.Read.All (unconfirmed if needed), and GroupMember.Read.All (unconfirmed if needed) under "API Permissions". Then ask them to provide Admin Consent on the same page (click a button applying these changes, essentially).
  3. For those with an Azure API Key, unfortunately I can't provide a lot of detail below).

Enterprise Security Command Line:

For those with an Azure API Key (may require special permission to request one) use the provided sample script at $SPLUNK_HOME/share/splunk/authScriptSamples/azureScripted.py (confirmed for Splunk 8.1)

For those with a Client Secret (assigned to your Splunk SAML application, much easier to acquire) use the script from https://gist.github.com/vprasanth87/5bd091f0eb24c4919b938f0528ee93bc

Place a copy of one of the above scripts into the $SPLUNK_HOME/etc/auth/scripts

Have a Web Proxy/Gateway?

For those with Web Proxies not using a Global Proxy value by some other means: 

  1. Open the $SPLUNK_HOME/etc/auth/scripts/azureScripted.py file in a file editor
  2. Find this line near the top of the script: USER_ENDPOINT = 'https://graph.microsoft.com/v1.0/users/'
  3. Below that, add the following lines:
    proxies = {
    "http" : "http://IPADDRESS:PORT",
    "https" : "http://IPADDRESS:PORT"
    }

  4. Search the py for any reference to "requests" and add "proxies=proxies" as an argument.
    1. Example: access_token_response = requests.post(token_url, data=payload, verify=False, allow_redirects=False, headers=headers, proxies=proxies)

Want a Debug Log for the Azure Script's Execution?

For those wanting to create a log file so you can see what the script is doing and where it's failing:

  1. Between the "USER_ENDPOINT =" line and "def getUserInfo(args):" line add:
    1. logging.basicConfig(filename='azureScripted.log', level=logging.DEBUG)
  2. Then throughout the script you can add logging.LEVEL() to see what is happening at any given point and isolate where the script stops (errors out).
    1. Example: logging.info('Header: %s', access_token_response.headers)
    2. Example: logging.debug('Token: %s', tokens['access_token'])

Implementing the Azure SAML AQR Workaround...

For Client Secret users, edit the azureScripted.py file... make the following changes (haven't tested they're mandatory, just know they work) within the "def getUserInfo(args):" definition...

  1. Above the line: token_url = "https://login.microsoftonline.com/${TENANT_ID}/oauth2/v2.0/token"
    1. Add: azure_tenant = args['tenantId']
    2. Add: client_id = args['clientId']
    3. Add: client_secret = args['clientSecret']
  2. Replace the token_url line with:
    token_url = "https://login.microsoftonline.com/{}/oauth2/v2.0/token".format(azure_tenant)
  3. Comment out or remove the following lines:
    1. client_id = '${AZURE_SPLUNK_SSO_APP_ID}'
    2. client_secret = '{AZURE_SSO_APP_API_KEY}'

Then in the Enterprise Security Web UI, go to Settings > Authentication Methods and click the SAML Settings link. Click the SAML Configuration button in the top right. Scroll down until you see the "Authentication Extensions" section header, and click the arrow to expand the section.

  1. Script Path: azureScripted.py
  2. Script Timeout: if left blank it will default to 10s, this seems to be sufficient
  3. Get User Info Time to Live: if left blank it will default to 3600s, this seems to be sufficient
  4. Script Functions: getUserInfo
  5. Script Secure Arguments: enter the key name below in the left column, value in the right column.
    1. For Client Secret, Key: clientId
    2. For Client Secret, Key: clientSecret
    3. For Client Secret, Key: tenantId
    4. For Azure API: azureKey
  6. Click Save

Validating

  1. Open the Enterprise Security app and go to Incident Review
  2. Click the down arrow next to any Notable entry you'd like to test with, then click Run Adaptive Response
  3. Choose a Response and fill it out, then click Run
  4. If the script is working as intended you should get a message about the response being successful instead of complaining about the token.
  5. If that works, wait over 1 hour with your session still open/active. Verify it doesn't kick you out or at least not in a way that doesn't require something as easy as click 'Refresh.' This depends on other settings in your environment, but as long as you don't get anything weird like it trying to launch the 'None' app (doesn't exist) or throwing a HTTP 500 response... should be good to go.
0 Karma

ebond_splunk
Splunk Employee
Splunk Employee

@Pcktech 

Hi! This issue was fixed in 6.4.1. Happy Splunkin!

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.