Getting Data In

Apache logfile with virtualhost added to logs

Hi All,

There is a set of webservers we are trying to index which have many virtual hosts on them. This is simple enough to add in apache by changing the LogFormat from


LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

to

LogFormat "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined

However this now breaks the magic that splunk used to do for parsing apache logfiles.

So I dug into /opt/splunk/etc/system/default/transforms.conf and found these lines

[access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

and in /opt/splunk/etc/system/default/props.conf found this


[access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [

I can see I just need to add a [[nspaces:vhost]]\s to the transforms.conf entry but obviously dont want to mess with the defaults.

I tried to replicate what I saw in props.conf and transforms.conf into my own app but it just didn't seem to work????

my inputs.conf

[monitor:///etc/httpd/logs/access_log*]
sourcetype = vhost_access_combined
disabled = false
followTail = 0
host = development.server.com
index = webserver

my props.conf

[vhost_access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = vhost-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [

my transforms.conf

[vhost-access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: vhost, clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:vhost]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

Any ideas how to get this working?

I have more complex questions to follow regarding having the host in splunk set to the value of vhost in the log entry but I will do this in baby steps first.

Tags (2)
0 Karma

Explorer

Universal Forwarder does not execute any parsing.

http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Typesofforwarders

0 Karma

Ok I seemed to get it to work eventually using the following

inputs.conf


[monitor:///etc/httpd/logs/access_log*]
sourcetype = advanced_access_combined
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au

[monitor:///etc/httpd/logs/error_log*]
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au

props.conf


[advanced_access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = advanced-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [

transforms.conf


[all_lazy]
REGEX = .*?

[all]
REGEX = .*

[nspaces]
# matches one or more NON space characters
REGEX = S+

[qstring]
#matches a quoted "string" - extracts an unnamed variable - name MUST be provided as in [[qstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = "(?<>[^"]*+)"

[sbstring]
#matches a string enclosed in [] - extracts an unnamed variable - name MUST be provided as in [[sbstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = [(?<>[^]]*+)]

[bc_domain]
REGEX = (?<domain>w++://[^/s"]++)

[bc_uri]
# backwards compatible uri regex
# uri = path optionally followed by query [/this/path/file.js?query=part&other=var]
# path = root part followed by file [/root/part/file.part]
# Extracts: uri, uri_path, root, file, uri_query, uri_domain (optional if in proxy mode)
REGEX = (?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]](?<file>[^s?/]+)?)(?:?(?<uri_query>[^s]))?)

[reqstr]
REGEX = [^s"]++

[access-request]
# very relaxed regex for extracting fields from the request
REGEX = "s*+[[reqstr:method]]?(?:s++[bc_uri])?s+"

[advanced-access-extractions]
REGEX = ^[[nspaces:vhost]]s++[[nspaces:clientip]]s++[[nspaces:ident]]s++[[nspaces:user]]s++[[sbstring:req_time]]s++[[access-request]]s++[[nspaces:status]]s++[[nspaces:bytes]]s++[nspaces:req_process_time]?[[all:other]]

It seemed I needed to copy alot of extras from the /opt/splunk/etc/system/default/transforms.conf which makes sense.

Another issue I encountered was that I have a primary index server and the apache files are being forwarded using a 'Universal Forwarder'

The whole thing did not work when props.conf and tranforms.conf were on the 'Universal Forwarder'. I needed to add them to the indexing server for the logfiles to be parsed correctly.

This is potentially going to be an issue as I would like to get the virtual host in the logfile to be marked as the Splunk host. The host is currently defined on the 'Universal Forwarder' in inputs.conf however I dont extract the virtual host until it hits the transforms.conf on the indexing server. I think by that time it will be too late to set the Splunk host. Anyway I will create a new question for that as it is out of the scope of this one.

Edit: The formatting rules here are useless when pasting in conf files so they are a bit munted. If someone needs the configs message me (if thats possible with splunkbase)

0 Karma

Ultra Champion

Did answers remove the leading slash on your \s++ because it's only showing 's++'?

Here is an example log line

developer.management.theclient.rdev.com 192.168.31.108 - stingray [06/Jul/2011:12:33:21 +1000] "GET /pop.php?m=testimonial/edit&id=1 HTTP/1.1" 200 166 "http://developer.management.theclient.rdev.com/?m=testimonial/details&id=1" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0"

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!