Hi All,
There is a set of webservers we are trying to index which have many virtual hosts on them. This is simple enough to add in apache by changing the LogFormat from
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
to
LogFormat "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined
However this now breaks the magic that splunk used to do for parsing apache logfiles.
So I dug into /opt/splunk/etc/system/default/transforms.conf and found these lines
[access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]
and in /opt/splunk/etc/system/default/props.conf found this
[access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [
I can see I just need to add a [[nspaces:vhost]]\s to the transforms.conf entry but obviously dont want to mess with the defaults.
I tried to replicate what I saw in props.conf and transforms.conf into my own app but it just didn't seem to work????
my inputs.conf
[monitor:///etc/httpd/logs/access_log*]
sourcetype = vhost_access_combined
disabled = false
followTail = 0
host = development.server.com
index = webserver
my props.conf
[vhost_access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = vhost-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [
my transforms.conf
[vhost-access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: vhost, clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:vhost]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]
Any ideas how to get this working?
I have more complex questions to follow regarding having the host in splunk set to the value of vhost in the log entry but I will do this in baby steps first.
Universal Forwarder does not execute any parsing.
http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Typesofforwarders
Ok I seemed to get it to work eventually using the following
inputs.conf
[monitor:///etc/httpd/logs/access_log*]
sourcetype = advanced_access_combined
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au
[monitor:///etc/httpd/logs/error_log*]
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au
props.conf
[advanced_access_combined]
pulldown_type = true
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = advanced-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = [
transforms.conf
[all_lazy]
REGEX = .*?
[all]
REGEX = .*
[nspaces]
# matches one or more NON space characters
REGEX = S+
[qstring]
#matches a quoted "string" - extracts an unnamed variable - name MUST be provided as in [[qstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = "(?<>[^"]*+)"
[sbstring]
#matches a string enclosed in [] - extracts an unnamed variable - name MUST be provided as in [[sbstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = [(?<>[^]]*+)]
[bc_domain]
REGEX = (?<domain>w++://[^/s"]++)
[bc_uri]
# backwards compatible uri regex
# uri = path optionally followed by query [/this/path/file.js?query=part&other=var]
# path = root part followed by file [/root/part/file.part]
# Extracts: uri, uri_path, root, file, uri_query, uri_domain (optional if in proxy mode)
REGEX = (?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]](?<file>[^s?/]+)?)(?:?(?<uri_query>[^s]))?)
[reqstr]
REGEX = [^s"]++
[access-request]
# very relaxed regex for extracting fields from the request
REGEX = "s*+[[reqstr:method]]?(?:s++[bc_uri])?s+"
[advanced-access-extractions]
REGEX = ^[[nspaces:vhost]]s++[[nspaces:clientip]]s++[[nspaces:ident]]s++[[nspaces:user]]s++[[sbstring:req_time]]s++[[access-request]]s++[[nspaces:status]]s++[[nspaces:bytes]]s++[nspaces:req_process_time]?[[all:other]]
It seemed I needed to copy alot of extras from the /opt/splunk/etc/system/default/transforms.conf which makes sense.
Another issue I encountered was that I have a primary index server and the apache files are being forwarded using a 'Universal Forwarder'
The whole thing did not work when props.conf and tranforms.conf were on the 'Universal Forwarder'. I needed to add them to the indexing server for the logfiles to be parsed correctly.
This is potentially going to be an issue as I would like to get the virtual host in the logfile to be marked as the Splunk host. The host is currently defined on the 'Universal Forwarder' in inputs.conf however I dont extract the virtual host until it hits the transforms.conf on the indexing server. I think by that time it will be too late to set the Splunk host. Anyway I will create a new question for that as it is out of the scope of this one.
Edit: The formatting rules here are useless when pasting in conf files so they are a bit munted. If someone needs the configs message me (if thats possible with splunkbase)
Did answers remove the leading slash on your \s++
because it's only showing 's++'?
Here is an example log line
developer.management.theclient.rdev.com 192.168.31.108 - stingray [06/Jul/2011:12:33:21 +1000] "GET /pop.php?m=testimonial/edit&id=1 HTTP/1.1" 200 166 "http://developer.management.theclient.rdev.com/?m=testimonial/details&id=1" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0"