Getting Data In

Apache logfile with virtualhost added to logs

Hi All,

There is a set of webservers we are trying to index which have many virtual hosts on them. This is simple enough to add in apache by changing the LogFormat from


LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

to

LogFormat "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined

However this now breaks the magic that splunk used to do for parsing apache logfiles.

So I dug into /opt/splunk/etc/system/default/transforms.conf and found these lines

[access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, reqtime, method, uri, root, file, uridomain, uriquery, version, status, bytes, refererurl, refererdomain, refererproto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:reqtime]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?[[bcdomain:referer_]]?+[^"]*+)"(?:\s++[qstring:useragent]?+)?+)?[[all:other]]

and in /opt/splunk/etc/system/default/props.conf found this


[accesscombined]
pulldown
type = true
maxDist = 28
MAXTIMESTAMPLOOKAHEAD = 128
REPORT-access = access-extractions
SHOULDLINEMERGE = False
TIME
PREFIX = [

I can see I just need to add a [[nspaces:vhost]]\s to the transforms.conf entry but obviously dont want to mess with the defaults.

I tried to replicate what I saw in props.conf and transforms.conf into my own app but it just didn't seem to work????

my inputs.conf

[monitor:///etc/httpd/logs/accesslog*]
sourcetype = vhost
access_combined
disabled = false
followTail = 0
host = development.server.com
index = webserver

my props.conf

[vhostaccesscombined]
pulldowntype = true
maxDist = 28
MAX
TIMESTAMPLOOKAHEAD = 128
REPORT-access = vhost-access-extractions
SHOULD
LINEMERGE = False
TIME_PREFIX = [

my transforms.conf

[vhost-access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: vhost, clientip, clientport, ident, user, reqtime, method, uri, root, file, uridomain, uriquery, version, status, bytes, refererurl, refererdomain, refererproto, useragent, cookie, other (remaining chars)
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"
REGEX = ^[[nspaces:vhost]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:reqtime]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?[[bcdomain:referer_]]?+[^"]*+)"(?:\s++[qstring:useragent]?+)?+)?[[all:other]]

Any ideas how to get this working?

I have more complex questions to follow regarding having the host in splunk set to the value of vhost in the log entry but I will do this in baby steps first.

Tags (2)
0 Karma

Explorer

Universal Forwarder does not execute any parsing.

http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Typesofforwarders

0 Karma

Ok I seemed to get it to work eventually using the following

inputs.conf


[monitor:///etc/httpd/logs/accesslog*]
sourcetype = advanced
access_combined
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au

[monitor:///etc/httpd/logs/error_log*]
index = webserver
disabled = false
followTail = 0
host = devserver.remora.com.au

props.conf


[advancedaccesscombined]
pulldowntype = true
maxDist = 28
MAX
TIMESTAMPLOOKAHEAD = 128
REPORT-access = advanced-access-extractions
SHOULD
LINEMERGE = False
TIME_PREFIX = [

transforms.conf


[all_lazy]
REGEX = .*?

[all]
REGEX = .*

[nspaces]
# matches one or more NON space characters
REGEX = S+

[qstring]
#matches a quoted "string" - extracts an unnamed variable - name MUST be provided as in [[qstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = "(?<>[^"]*+)"

[sbstring]
#matches a string enclosed in [] - extracts an unnamed variable - name MUST be provided as in [[sbstring:name]]
# Extracts: empty-name-group (needs name)
REGEX = [(?<>[^]]*+)]

[bc_domain]
REGEX = (?<domain>w++://[^/s"]++)

[bcuri]
# backwards compatible uri regex
# uri = path optionally followed by query [/this/path/file.js?query=part&other=var]
# path = root part followed by file [/root/part/file.part]
# Extracts: uri, uri
path, root, file, uriquery, uridomain (optional if in proxy mode)
REGEX = (?<uri>[[bcdomain:uri]]?+(?<uripath>[[uriroot]]?[[uriseg]]*(?<file>[^s?/]+)?)(?:?(?<uriquery>[^s]*))?)

[reqstr]
REGEX = [^s"]++

[access-request]
# very relaxed regex for extracting fields from the request
REGEX = "s+[[reqstr:method]]?(?:s++[bc_uri])?s*+"

[advanced-access-extractions]
REGEX = ^[[nspaces:vhost]]s++[[nspaces:clientip]]s++[[nspaces:ident]]s++[[nspaces:user]]s++[[sbstring:reqtime]]s++[[access-request]]s++[[nspaces:status]]s++[[nspaces:bytes]]s++[[nspaces:reqprocesstime]](?:s++"(?<referer>[[bcdomain:referer_]]?+[^"]*+)"(?:s++[qstring:useragent]?+)?+)?[[all:other]]

It seemed I needed to copy alot of extras from the /opt/splunk/etc/system/default/transforms.conf which makes sense.

Another issue I encountered was that I have a primary index server and the apache files are being forwarded using a 'Universal Forwarder'

The whole thing did not work when props.conf and tranforms.conf were on the 'Universal Forwarder'. I needed to add them to the indexing server for the logfiles to be parsed correctly.

This is potentially going to be an issue as I would like to get the virtual host in the logfile to be marked as the Splunk host. The host is currently defined on the 'Universal Forwarder' in inputs.conf however I dont extract the virtual host until it hits the transforms.conf on the indexing server. I think by that time it will be too late to set the Splunk host. Anyway I will create a new question for that as it is out of the scope of this one.

Edit: The formatting rules here are useless when pasting in conf files so they are a bit munted. If someone needs the configs message me (if thats possible with splunkbase)

0 Karma

Ultra Champion

Did answers remove the leading slash on your \s++ because it's only showing 's++'?

Here is an example log line

developer.management.theclient.rdev.com 192.168.31.108 - stingray [06/Jul/2011:12:33:21 +1000] "GET /pop.php?m=testimonial/edit&id=1 HTTP/1.1" 200 166 "http://developer.management.theclient.rdev.com/?m=testimonial/details&id=1" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0"

0 Karma