Getting Data In

Mutlivalue Field Extraction

nateloepker
Explorer

Hello,

I'm writing some field extractions for a Tomcat access log. The logging format is

"%{E M/d/y @ hh:mm:ss.S a z}t %h (%{X-Forwarded-For}i) > %A:%p "%r" %{requestBodyLength}r %D %s %B %I "%{Referer}i" "%{User-Agent}i" %u %S %{username}s %{sessionTracker}s"

The X-Forwarded Field has multiple headers, so multiple X-Forwarded-For IP's are being logged for a small, but important, percentage of these events.

An example log is

Thu 1/18/2024 @ 06:52:30.918 PM UTC 00.000.00.000 (00.000.000.000, 00.000.00.00, 00.000.00.00) > 00.000.00.0:0000 "PUT /uri/query/here HTTP/1.1" -  1270 200 3466 https-openssl-nio-00.000.00.0-000-exec-15 "hxxps://url.splunk.com/" "user_agent" - - - -

How can I perform a multivalue field extraction to grab 0, 1, 2 or 3 x-forwarded-for IP's?

Labels (3)
0 Karma
1 Solution

nateloepker
Explorer

I solved it by using the max_match option in the rex command. The x-forwarded-fors were extracted into a multivalue field x_forwarded_single

| rex field=_raw "^(?P<timestamp>\w+\s\d+\/\d+\/\d+\s.\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?P<remote_hostname>\S+)\s\((?P<x_forwarded_for>[^\)]*)\)\s\>\s(?P<local_ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?P<local_port>[\d\-]+)\s\"(?<request>[^\"]+)\"\s(?<request_body_length>\S+)\s(?<time_milli>\S+)\s(?<http_status>\S+)\s(?<bytes_sent>\S+)\s(?<request_thread_name>\S+)\s\"(?<referer>[^\"\s]*)\"\s\"(?<user_agent>[^\"]*)\"\s(?<remote_user>\S+)\s(?<user_session_id>\S+)\s(?<username>\S+)\s(?<session_tracker>\S+)"
| rex field=request "(?<http_method>\w*)\s+(?<url>[^ ]*)\s+(?<http_version>[^\"]+)[^ \n]*"
| rex field=url "(?<uri_path>[^?]+)(?:(?<uri_query>\?.*))?"
| rex field=x_forwarded_for max_match=3 "(?<x_forwarded_single>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

View solution in original post

dural_yyz
Motivator
| makeresults 
| eval tmp="Thu 1/18/2024 @ 06:52:30.918 PM UTC 00.000.00.000 (00.000.000.001, 00.000.00.01, 00.000.00.03) > 00.000.00.0:0000 \"PUT /uri/query/here HTTP/1.1\" - 1270 200 3466 https-openssl-nio-00.000.00.0-000-exec-15 \"hxxps://url.splunk.com/\" \"user_agent\" - - - -"
| rex field=tmp "^(?<timestamp>\w+\s\d+\/\d+\/\d+\s\@\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?<remote_hostname>\S+)\s\((?<x_forwarded_for>[^\)]+).*$"
| table tmp timestamp remote_hostname x_forwarded_for
| eval x_forwarded_for=split(replace(x_forwarded_for,"\s",""),",")

Hello,

This will auto extract a variable number of x-forwarded-for addresses and place into a multi value field. 

0 Karma

nateloepker
Explorer

I solved it by using the max_match option in the rex command. The x-forwarded-fors were extracted into a multivalue field x_forwarded_single

| rex field=_raw "^(?P<timestamp>\w+\s\d+\/\d+\/\d+\s.\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?P<remote_hostname>\S+)\s\((?P<x_forwarded_for>[^\)]*)\)\s\>\s(?P<local_ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?P<local_port>[\d\-]+)\s\"(?<request>[^\"]+)\"\s(?<request_body_length>\S+)\s(?<time_milli>\S+)\s(?<http_status>\S+)\s(?<bytes_sent>\S+)\s(?<request_thread_name>\S+)\s\"(?<referer>[^\"\s]*)\"\s\"(?<user_agent>[^\"]*)\"\s(?<remote_user>\S+)\s(?<user_session_id>\S+)\s(?<username>\S+)\s(?<session_tracker>\S+)"
| rex field=request "(?<http_method>\w*)\s+(?<url>[^ ]*)\s+(?<http_version>[^\"]+)[^ \n]*"
| rex field=url "(?<uri_path>[^?]+)(?:(?<uri_query>\?.*))?"
| rex field=x_forwarded_for max_match=3 "(?<x_forwarded_single>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...