Splunk Enterprise

Replace values of host, before indexing, with data from event

nuaraujo
Path Finder

Hi everyone,

I’m currently working on extracting the webaclId field from AWS WAF logs and setting it as the host metadata in Splunk. However, I’ve been running into issues where the regex doesn’t seem to work, and Splunk throws the error:

 

Log Example:

Below is an obfuscated example of an event from the logs I’m working with:

 

 

{
"timestamp": 1733490000011,
"formatVersion": 1,
"webaclId": "arn:aws:wafv2:region:account-id:regional/webacl/webacl-name/resource-id",
"action": "ALLOW",
"httpRequest": {
"clientIp": "192.0.2.1",
"country": "XX",
"headers": [
{ "name": "Host", "value": "example.com" }
],
"uri": "/v2.01/endpoint/path/resource",
"httpMethod": "GET"
}
}

 

 I want to extract the webacl-name from the webaclId field and set it as the host metadata in Splunk. For the above example, the desired host value should be: webacl-name

Here’s my current Splunk configuration:

inputs.conf:

[monitor:///opt/splunk/etc/tes*.txt]
disabled = false
index = test
sourcetype = aws:waf

 

props.conf:

 

[sourcetype::aws:waf]
TRANSFORMS-set_host = extract_webacl_name

 


transforms.conf:

 

[extract_webacl_name]
REGEX = \"webaclId\":\"[^:]+:[^:]+:[^:]+:[^:]+:[^:]+:regional\/webacl\/([^\/]+)\/
FORMAT = host::$1
DEST_KEY = MetaData:Host
SOURCE_KEY = _raw

 

 

What I’ve Tried:
I’ve validated the regex on external tools like regex101, and it works for the log structure.

For example, the regex successfully extracts webacl-name from:

"webaclId":"arn:aws:wafv2:region:account-id:regional/webacl/webacl-name/resource-id"


Manual rex Testing in Splunk:

 

index=test sourcetype=aws:waf 
| rex field=_raw "\"webaclId\":\"[^:]+:[^:]+:[^:]+:[^:]+:[^:]+:regional\/webacl\/(?<webacl_name>[^\/]+)\/" 
| table _raw webacl_name

 

 

Questions:

  1. Does my transforms.conf configuration have any issues I might be missing?
  2. Is there an alternative or more efficient way to handle this extraction and rewrite the host field?
  3. Are there any known limitations or edge cases with using JSON data for MetaData:Host updates?

    I’d greatly appreciate any insights or suggestions. Thank you for your help!

Labels (2)
1 Solution

isoutamo
SplunkTrust
SplunkTrust

In props.conf, when you are using sourcetype as stanza name, use just the name of sourcetype instead add prefix sourcetype::

View solution in original post

0 Karma

isoutamo
SplunkTrust
SplunkTrust

In props.conf, when you are using sourcetype as stanza name, use just the name of sourcetype instead add prefix sourcetype::

0 Karma

dural_yyz
Motivator
\"webaclId\":\s\"[^:]+:[^:]+:[^:]+:[^:]+:[^:]+:regional\/webacl\/([^\/]+)\/

Your example data has a space "webaclId": "

Verified from regex101

 

0 Karma
Get Updates on the Splunk Community!

Holistic Visibility and Effective Alerting Across IT and OT Assets

Instead of effective and unified solutions, they’re left with tool fatigue, disjointed alerts and siloed ...

SOC Modernization: How Automation and Splunk SOAR are Shaping the Next-Gen Security ...

Security automation is no longer a luxury but a necessity. Join us to learn how Splunk ES and SOAR empower ...

Ask It, Fix It: Faster Investigations with AI Assistant in Observability Cloud

  Join us in this Tech Talk and learn about the recently launched AI Assistant in Observability Cloud. With ...