Getting Data In

Mutlivalue Field Extraction

nateloepker
Explorer

Hello,

I'm writing some field extractions for a Tomcat access log. The logging format is

"%{E M/d/y @ hh:mm:ss.S a z}t %h (%{X-Forwarded-For}i) > %A:%p "%r" %{requestBodyLength}r %D %s %B %I "%{Referer}i" "%{User-Agent}i" %u %S %{username}s %{sessionTracker}s"

The X-Forwarded Field has multiple headers, so multiple X-Forwarded-For IP's are being logged for a small, but important, percentage of these events.

An example log is

Thu 1/18/2024 @ 06:52:30.918 PM UTC 00.000.00.000 (00.000.000.000, 00.000.00.00, 00.000.00.00) > 00.000.00.0:0000 "PUT /uri/query/here HTTP/1.1" -  1270 200 3466 https-openssl-nio-00.000.00.0-000-exec-15 "hxxps://url.splunk.com/" "user_agent" - - - -

How can I perform a multivalue field extraction to grab 0, 1, 2 or 3 x-forwarded-for IP's?

Labels (3)
0 Karma
1 Solution

nateloepker
Explorer

I solved it by using the max_match option in the rex command. The x-forwarded-fors were extracted into a multivalue field x_forwarded_single

| rex field=_raw "^(?P<timestamp>\w+\s\d+\/\d+\/\d+\s.\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?P<remote_hostname>\S+)\s\((?P<x_forwarded_for>[^\)]*)\)\s\>\s(?P<local_ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?P<local_port>[\d\-]+)\s\"(?<request>[^\"]+)\"\s(?<request_body_length>\S+)\s(?<time_milli>\S+)\s(?<http_status>\S+)\s(?<bytes_sent>\S+)\s(?<request_thread_name>\S+)\s\"(?<referer>[^\"\s]*)\"\s\"(?<user_agent>[^\"]*)\"\s(?<remote_user>\S+)\s(?<user_session_id>\S+)\s(?<username>\S+)\s(?<session_tracker>\S+)"
| rex field=request "(?<http_method>\w*)\s+(?<url>[^ ]*)\s+(?<http_version>[^\"]+)[^ \n]*"
| rex field=url "(?<uri_path>[^?]+)(?:(?<uri_query>\?.*))?"
| rex field=x_forwarded_for max_match=3 "(?<x_forwarded_single>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

View solution in original post

dural_yyz
Motivator
| makeresults 
| eval tmp="Thu 1/18/2024 @ 06:52:30.918 PM UTC 00.000.00.000 (00.000.000.001, 00.000.00.01, 00.000.00.03) > 00.000.00.0:0000 \"PUT /uri/query/here HTTP/1.1\" - 1270 200 3466 https-openssl-nio-00.000.00.0-000-exec-15 \"hxxps://url.splunk.com/\" \"user_agent\" - - - -"
| rex field=tmp "^(?<timestamp>\w+\s\d+\/\d+\/\d+\s\@\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?<remote_hostname>\S+)\s\((?<x_forwarded_for>[^\)]+).*$"
| table tmp timestamp remote_hostname x_forwarded_for
| eval x_forwarded_for=split(replace(x_forwarded_for,"\s",""),",")

Hello,

This will auto extract a variable number of x-forwarded-for addresses and place into a multi value field. 

0 Karma

nateloepker
Explorer

I solved it by using the max_match option in the rex command. The x-forwarded-fors were extracted into a multivalue field x_forwarded_single

| rex field=_raw "^(?P<timestamp>\w+\s\d+\/\d+\/\d+\s.\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?P<remote_hostname>\S+)\s\((?P<x_forwarded_for>[^\)]*)\)\s\>\s(?P<local_ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?P<local_port>[\d\-]+)\s\"(?<request>[^\"]+)\"\s(?<request_body_length>\S+)\s(?<time_milli>\S+)\s(?<http_status>\S+)\s(?<bytes_sent>\S+)\s(?<request_thread_name>\S+)\s\"(?<referer>[^\"\s]*)\"\s\"(?<user_agent>[^\"]*)\"\s(?<remote_user>\S+)\s(?<user_session_id>\S+)\s(?<username>\S+)\s(?<session_tracker>\S+)"
| rex field=request "(?<http_method>\w*)\s+(?<url>[^ ]*)\s+(?<http_version>[^\"]+)[^ \n]*"
| rex field=url "(?<uri_path>[^?]+)(?:(?<uri_query>\?.*))?"
| rex field=x_forwarded_for max_match=3 "(?<x_forwarded_single>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Monitoring AI Agents with Splunk Observability Cloud

Let’s say I’m running a travel planning AI app in production. A user asks for three concise hotel options in ...

[Puzzles] Solve, Learn, Repeat: Tiling

This puzzle (first published here) is based on finding groups of tessellated tiles (inspired by floor tiles I ...

SOK it to Me: Top 3 Benefits of Using Splunk Operator on Kubernetes that’ll Make ...

    Thursday, July 9, 2026  |  11:00AM–12:00PM PDT Duration: 1 hour (includes Q&A) Managing can feel like a ...