Splunk Search

Is there a certain log format for a webserver behind a load balancer so Splunk can parse access combined logs as expected?

sunilsuresh
New Member

Dear Experties,

I am working on onboarding the apache weblogs and mapping the data in to access combined sourcetype to parse the data and files extraction as per the transform file.

The webeserver is behind a loadbalancer to forward the request. Now the client IP is showing as my load balancer and public IP is not extracted using the access combined transform file.

But the log format we have in web server is LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Is there any specific log format for webserver which is behind load balancer so that splunk can parse the log as expected.

There is a different in logformat with out of the box regular expression.

The sample log we are getting is

87.109.30.200, xx.xx.xx.xx - - [01/Jan/2015:00:06:00 +0300] "GET /wps/contenthandler/rbg/!ut/p/digest!2jrll8SkahQpQlUhFJmocw/sp/mashup:ra:collection?soffset=0&eoffset=6&themeID=ZJ_JA28HB02IG0R20IVDOA2AO20G4&locale=ar&locale=en&mime-type=text%2Fcss&entry=wp_one_ui_30__0.0%3Ahead_css&entry=wp_one_ui_dijit_30__0.0%3Ahead_css&entry=wp_legacy_layouts__0.0%3Ahead_css&entry=wp_theme_portal_80__0.0%3Ahead_css&entry=wp_status_bar__0.0%3Ahead_css&entry=wp_portlet_css__0.0%3Ahead_css HTTP/1.1" 200 207302 "https://www.xxx.xxx.com.in/wps/portal/rbg/login" "Mozilla/5.0 (Linux; U; Android 4.1.1; ar-ae; GT-N7100 Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"

Any suggestion on making the change in regex than doing the field extraction.

Thanks,

Sunil Suresh

0 Karma

chanfoli
Builder

Hello,

The X-Forwarded-For header is not standard content for the access combined sourcetype. Since this header can contain a variable number of IP addresses, separated by commas you will need an extraction regex which expects a field with multiple IP addresses and extracts clientip as the first, this could be problematic due to the spaces which might be present in that field so you would need a fairly advanced regex to work reliably and not miss the other extractions for the event. I've looked around in answers and it seems like perhaps nobody has come up with a clean way to do what you are after. The built in extractions for apache logs depend heavily on extracting fields delimited by spaces, so an unquoted multivalued field containing spaces is going to be a problem. You could quote it in the log format string and make the extraction of this header easier, but at this point, you still have a list of an unpredictable number of IPs where the first is expected to be the actual client.

I will keep looking for options as I find some time, but you may want to consider a different path. We have our load balancer/CDN add a custom X header containing the the client's true IP and we use the contents of this field as the clientip field in our log format string and this works pretty well.

0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...