Getting Data In

Mutlivalue Field Extraction

nateloepker
Explorer

Hello,

I'm writing some field extractions for a Tomcat access log. The logging format is

"%{E M/d/y @ hh:mm:ss.S a z}t %h (%{X-Forwarded-For}i) > %A:%p "%r" %{requestBodyLength}r %D %s %B %I "%{Referer}i" "%{User-Agent}i" %u %S %{username}s %{sessionTracker}s"

The X-Forwarded Field has multiple headers, so multiple X-Forwarded-For IP's are being logged for a small, but important, percentage of these events.

An example log is

Thu 1/18/2024 @ 06:52:30.918 PM UTC 00.000.00.000 (00.000.000.000, 00.000.00.00, 00.000.00.00) > 00.000.00.0:0000 "PUT /uri/query/here HTTP/1.1" -  1270 200 3466 https-openssl-nio-00.000.00.0-000-exec-15 "hxxps://url.splunk.com/" "user_agent" - - - -

How can I perform a multivalue field extraction to grab 0, 1, 2 or 3 x-forwarded-for IP's?

Labels (3)
0 Karma
1 Solution

nateloepker
Explorer

I solved it by using the max_match option in the rex command. The x-forwarded-fors were extracted into a multivalue field x_forwarded_single

| rex field=_raw "^(?P<timestamp>\w+\s\d+\/\d+\/\d+\s.\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?P<remote_hostname>\S+)\s\((?P<x_forwarded_for>[^\)]*)\)\s\>\s(?P<local_ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?P<local_port>[\d\-]+)\s\"(?<request>[^\"]+)\"\s(?<request_body_length>\S+)\s(?<time_milli>\S+)\s(?<http_status>\S+)\s(?<bytes_sent>\S+)\s(?<request_thread_name>\S+)\s\"(?<referer>[^\"\s]*)\"\s\"(?<user_agent>[^\"]*)\"\s(?<remote_user>\S+)\s(?<user_session_id>\S+)\s(?<username>\S+)\s(?<session_tracker>\S+)"
| rex field=request "(?<http_method>\w*)\s+(?<url>[^ ]*)\s+(?<http_version>[^\"]+)[^ \n]*"
| rex field=url "(?<uri_path>[^?]+)(?:(?<uri_query>\?.*))?"
| rex field=x_forwarded_for max_match=3 "(?<x_forwarded_single>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

View solution in original post

dural_yyz
Communicator
| makeresults 
| eval tmp="Thu 1/18/2024 @ 06:52:30.918 PM UTC 00.000.00.000 (00.000.000.001, 00.000.00.01, 00.000.00.03) > 00.000.00.0:0000 \"PUT /uri/query/here HTTP/1.1\" - 1270 200 3466 https-openssl-nio-00.000.00.0-000-exec-15 \"hxxps://url.splunk.com/\" \"user_agent\" - - - -"
| rex field=tmp "^(?<timestamp>\w+\s\d+\/\d+\/\d+\s\@\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?<remote_hostname>\S+)\s\((?<x_forwarded_for>[^\)]+).*$"
| table tmp timestamp remote_hostname x_forwarded_for
| eval x_forwarded_for=split(replace(x_forwarded_for,"\s",""),",")

Hello,

This will auto extract a variable number of x-forwarded-for addresses and place into a multi value field. 

0 Karma

nateloepker
Explorer

I solved it by using the max_match option in the rex command. The x-forwarded-fors were extracted into a multivalue field x_forwarded_single

| rex field=_raw "^(?P<timestamp>\w+\s\d+\/\d+\/\d+\s.\s\d+:\d+:\d+\.\d+\s\w+\s\w+)\s(?P<remote_hostname>\S+)\s\((?P<x_forwarded_for>[^\)]*)\)\s\>\s(?P<local_ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(?P<local_port>[\d\-]+)\s\"(?<request>[^\"]+)\"\s(?<request_body_length>\S+)\s(?<time_milli>\S+)\s(?<http_status>\S+)\s(?<bytes_sent>\S+)\s(?<request_thread_name>\S+)\s\"(?<referer>[^\"\s]*)\"\s\"(?<user_agent>[^\"]*)\"\s(?<remote_user>\S+)\s(?<user_session_id>\S+)\s(?<username>\S+)\s(?<session_tracker>\S+)"
| rex field=request "(?<http_method>\w*)\s+(?<url>[^ ]*)\s+(?<http_version>[^\"]+)[^ \n]*"
| rex field=url "(?<uri_path>[^?]+)(?:(?<uri_query>\?.*))?"
| rex field=x_forwarded_for max_match=3 "(?<x_forwarded_single>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...