Getting Data In

Apache access_combined with X-Forwarded-For instead of host

geraldhanks
New Member

In our organization our apache log files are of type access_combined with the exception of the host field being replaced with the value(s) from the x-forwarded-for field because of the use of load balancers and other caching mechanisms.

This creates a situation where the host field end up looking like:

xx.xx.xx.xx or

xx.xx.xx.xx, xx.xx.xx.xx or

xx.xx.xx.xx, xx.xx.xx.xx, xx.xx.xx.xx etc

I have seen log entries with as many as 5 host ip's in the x-forwarded-for field. Can someone explain the process required to have splunk correctly index the access logs given this variability in the log entries?

0 Karma

jgoddard
Path Finder

I have a solution that works pretty well, at least in our environments. I haven't tested it thoroughly against IPv6 addresses, but the few "fake" ones I am getting look to come through.

Note, in our environments we have the ClientIP field for the webserver replaced by the X-Forwarded-For IP list if we get that header. So we took the default access-extractions REGEX from $SPLUNK_HOME/etc/system/default/transforms.conf, took out the [[nspaces:clientip]], and replaced it with the following:

(?<all_xff_ip>(([.\d]+|[a-fA-f0-9\:\.]+|-|localhost)(?:,\s)?)+)

Then we have a second transform to break the individual IPs out as needed.
#A multivalue definition to capture all the xff ips
[mv_xff_ip]
SOURCE_KEY=all_xff_ip
REGEX = (?P[.:\d]+|[a-fA-f0-9:.]+|-|localhost)
MV_ADD = true

dewoodruff
Path Finder

This answer pointed me in the right direction but it was missing a piece of the puzzle - changes to props.conf. Here are all the configuration changes that had to be made to turn clientip into a multivalued field and correctly parse the X_Forwarded_For IPs:

transforms.conf - modified. The new regex is highlighted

[access-extractions]
REGEX = ^**(?<all_xff_ip>(([.\d]+|[a-fA-f0-9\:\.]+|-|localhost)(?:,\s)?)+)**\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?<referer>[[bc_domain:referer_]]?+[^"]*+)"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]

# new section
[clientip]
SOURCE_KEY=all_xff_ip
REGEX = (?P<clientip>[.:\d]+|[a-fA-f0-9\:.]+|-|localhost)
MV_ADD = true

props.conf

# new section
[access_combined]
REPORT-access_combined_clientip = clientip
0 Karma

scottsavaresevi
Path Finder

There is more than likely a much better way to do this... But here is how I wound up solving it...

The actual access log entry will look like

IP1, IP2, IP3 - - [time] "GET url ..." ...

right?

I use the [ as an anchor will multiple regex's:

rex field=_raw "^(?.*)\s+-\s+-\s+["

and then another rex to parse the rest of the line (of course the two - have meaning and you may want to pull those in as variables as well). Now, you can do that in props.conf and transforms.conf by having multiple REGEX lines in props.conf calling multiple transforms stanzas.

0 Karma

marcoscala
Builder

Use regex to override the field extraction and set the correct value, depending on the number of IPs in the different kind of logs....

0 Karma

Yoyoda
New Member

Same issue for me with the X-Forwared-For in the logs.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...