Splunk Search

Is there a certain log format for a webserver behind a load balancer so Splunk can parse access combined logs as expected?

New Member

Dear Experties,

I am working on onboarding the apache weblogs and mapping the data in to access combined sourcetype to parse the data and files extraction as per the transform file.

The webeserver is behind a loadbalancer to forward the request. Now the client IP is showing as my load balancer and public IP is not extracted using the access combined transform file.

But the log format we have in web server is LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Is there any specific log format for webserver which is behind load balancer so that splunk can parse the log as expected.

There is a different in logformat with out of the box regular expression.

The sample log we are getting is

87.109.30.200, xx.xx.xx.xx - - [01/Jan/2015:00:06:00 +0300] "GET /wps/contenthandler/rbg/!ut/p/digest!2jrll8SkahQpQlUhFJmocw/sp/mashup:ra:collection?soffset=0&eoffset=6&themeID=ZJ_JA28HB02IG0R20IVDOA2AO20G4&locale=ar&locale=en&mime-type=text%2Fcss&entry=wp_one_ui_30__0.0%3Ahead_css&entry=wp_one_ui_dijit_30__0.0%3Ahead_css&entry=wp_legacy_layouts__0.0%3Ahead_css&entry=wp_theme_portal_80__0.0%3Ahead_css&entry=wp_status_bar__0.0%3Ahead_css&entry=wp_portlet_css__0.0%3Ahead_css HTTP/1.1" 200 207302 "https://www.xxx.xxx.com.in/wps/portal/rbg/login" "Mozilla/5.0 (Linux; U; Android 4.1.1; ar-ae; GT-N7100 Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"

Any suggestion on making the change in regex than doing the field extraction.

Thanks,

Sunil Suresh

0 Karma

Builder

Hello,

The X-Forwarded-For header is not standard content for the access combined sourcetype. Since this header can contain a variable number of IP addresses, separated by commas you will need an extraction regex which expects a field with multiple IP addresses and extracts clientip as the first, this could be problematic due to the spaces which might be present in that field so you would need a fairly advanced regex to work reliably and not miss the other extractions for the event. I've looked around in answers and it seems like perhaps nobody has come up with a clean way to do what you are after. The built in extractions for apache logs depend heavily on extracting fields delimited by spaces, so an unquoted multivalued field containing spaces is going to be a problem. You could quote it in the log format string and make the extraction of this header easier, but at this point, you still have a list of an unpredictable number of IPs where the first is expected to be the actual client.

I will keep looking for options as I find some time, but you may want to consider a different path. We have our load balancer/CDN add a custom X header containing the the client's true IP and we use the contents of this field as the clientip field in our log format string and this works pretty well.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!