Hi everyone,
I’m currently working on extracting the webaclId field from AWS WAF logs and setting it as the host metadata in Splunk. However, I’ve been running into issues where the regex doesn’t seem to work, and Splunk throws the error:
Below is an obfuscated example of an event from the logs I’m working with:
{
"timestamp": 1733490000011,
"formatVersion": 1,
"webaclId": "arn:aws:wafv2:region:account-id:regional/webacl/webacl-name/resource-id",
"action": "ALLOW",
"httpRequest": {
"clientIp": "192.0.2.1",
"country": "XX",
"headers": [
{ "name": "Host", "value": "example.com" }
],
"uri": "/v2.01/endpoint/path/resource",
"httpMethod": "GET"
}
}
I want to extract the webacl-name from the webaclId field and set it as the host metadata in Splunk. For the above example, the desired host value should be: webacl-name
Here’s my current Splunk configuration:
inputs.conf:
[monitor:///opt/splunk/etc/tes*.txt] disabled = false index = test sourcetype = aws:waf
props.conf:
[sourcetype::aws:waf]
TRANSFORMS-set_host = extract_webacl_name
transforms.conf:
[extract_webacl_name]
REGEX = \"webaclId\":\"[^:]+:[^:]+:[^:]+:[^:]+:[^:]+:regional\/webacl\/([^\/]+)\/
FORMAT = host::$1
DEST_KEY = MetaData:Host
SOURCE_KEY = _raw
index=test sourcetype=aws:waf
| rex field=_raw "\"webaclId\":\"[^:]+:[^:]+:[^:]+:[^:]+:[^:]+:regional\/webacl\/(?<webacl_name>[^\/]+)\/"
| table _raw webacl_name
I’d greatly appreciate any insights or suggestions. Thank you for your help!
In props.conf, when you are using sourcetype as stanza name, use just the name of sourcetype instead add prefix sourcetype::
\"webaclId\":\s\"[^:]+:[^:]+:[^:]+:[^:]+:[^:]+:regional\/webacl\/([^\/]+)\/
Your example data has a space "webaclId": "
Verified from regex101