<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Apache logfile with virtualhost added to logs in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39230#M7279</link>
    <description>&lt;P&gt;Did answers remove the leading slash on your &lt;CODE&gt;\s++&lt;/CODE&gt; because it's only showing 's++'?&lt;/P&gt;</description>
    <pubDate>Thu, 25 Aug 2016 13:03:22 GMT</pubDate>
    <dc:creator>sloshburch</dc:creator>
    <dc:date>2016-08-25T13:03:22Z</dc:date>
    <item>
      <title>Apache logfile with virtualhost added to logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39226#M7275</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;

&lt;P&gt;There is a set of webservers we are trying to index which have many virtual hosts on them. This is simple enough to add in apache by changing the LogFormat from&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;
LogFormat "%h %l %u %t \"%r\" %&amp;gt;s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined&lt;BR /&gt;&lt;BR /&gt;
to&lt;BR /&gt;&lt;BR /&gt;
LogFormat "%V %h %l %u %t \"%r\" %&amp;gt;s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined&lt;BR /&gt;&lt;/P&gt;

&lt;P&gt;However this now breaks the magic that splunk used to do for parsing apache logfiles.&lt;/P&gt;

&lt;P&gt;So I dug into &lt;STRONG&gt;/opt/splunk/etc/system/default/transforms.conf&lt;/STRONG&gt; and found these lines&lt;BR /&gt;
&lt;PRE&gt;[access-extractions]&lt;BR /&gt;
# matches access-common or access-combined apache logging formats&lt;BR /&gt;
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)&lt;BR /&gt;
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"&lt;BR /&gt;
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++&lt;A href="https://community.splunk.com/?:s++%22(?%3Creferer%3E%5B%5Bbc_domain:referer_%5D%5D?+%5B%5E%22%5D*+)%22(?:s++%5B%5Bqstring:useragent%5D%5D(?:s++%5B%5Bqstring:cookie%5D%5D)?+)?+" target="_blank"&gt;[nspaces:bytes]&lt;/A&gt;?[[all:other]]&lt;BR /&gt;
&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;and in &lt;STRONG&gt;/opt/splunk/etc/system/default/props.conf&lt;/STRONG&gt; found this&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
[access_combined]&lt;BR /&gt;
pulldown_type = true&lt;BR /&gt;
maxDist = 28&lt;BR /&gt;
MAX_TIMESTAMP_LOOKAHEAD = 128&lt;BR /&gt;
REPORT-access = access-extractions&lt;BR /&gt;
SHOULD_LINEMERGE = False&lt;BR /&gt;
TIME_PREFIX = [&lt;BR /&gt;
&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;I can see I just need to add a [[nspaces:vhost]]\s to the transforms.conf entry but obviously dont want to mess with the defaults.&lt;/P&gt;

&lt;P&gt;I tried to replicate what I saw in props.conf and transforms.conf into my own app but it just didn't seem to work????&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;my inputs.conf&lt;/STRONG&gt;&lt;BR /&gt;
&lt;PRE&gt;[monitor:///etc/httpd/logs/access_log*]&lt;BR /&gt;
sourcetype = vhost_access_combined&lt;BR /&gt;
disabled = false&lt;BR /&gt;
followTail = 0&lt;BR /&gt;
host = development.server.com&lt;BR /&gt;
index = webserver&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;my props.conf&lt;/STRONG&gt;&lt;BR /&gt;
&lt;PRE&gt;[vhost_access_combined]&lt;BR /&gt;
pulldown_type = true&lt;BR /&gt;
maxDist = 28&lt;BR /&gt;
MAX_TIMESTAMP_LOOKAHEAD = 128&lt;BR /&gt;
REPORT-access = vhost-access-extractions&lt;BR /&gt;
SHOULD_LINEMERGE = False&lt;BR /&gt;
TIME_PREFIX = [&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;my transforms.conf&lt;/STRONG&gt;&lt;BR /&gt;
&lt;PRE&gt;[vhost-access-extractions]&lt;BR /&gt;
# matches access-common or access-combined apache logging formats&lt;BR /&gt;
# Extracts: vhost, clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)&lt;BR /&gt;
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer"&lt;BR /&gt;
REGEX = ^[[nspaces:vhost]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++&lt;A href="https://community.splunk.com/?:s++%22(?%3Creferer%3E%5B%5Bbc_domain:referer_%5D%5D?+%5B%5E%22%5D*+)%22(?:s++%5B%5Bqstring:useragent%5D%5D(?:s++%5B%5Bqstring:cookie%5D%5D)?+)?+" target="_blank"&gt;[nspaces:bytes]&lt;/A&gt;?[[all:other]]&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;Any ideas how to get this working?&lt;/P&gt;

&lt;P&gt;I have more complex questions to follow regarding having the host in splunk set to the value of vhost in the log entry but I will do this in baby steps first.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:42:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39226#M7275</guid>
      <dc:creator>phoenixdigital</dc:creator>
      <dc:date>2020-09-28T09:42:59Z</dc:date>
    </item>
    <item>
      <title>Re: Apache logfile with virtualhost added to logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39227#M7276</link>
      <description>&lt;P&gt;Here is an example log line&lt;/P&gt;

&lt;P&gt;developer.management.theclient.rdev.com 192.168.31.108 - stingray [06/Jul/2011:12:33:21 +1000] "GET /pop.php?m=testimonial/edit&amp;amp;id=1 HTTP/1.1" 200 166 "&lt;A href="http://developer.management.theclient.rdev.com/?m=testimonial/details&amp;amp;id=1"&gt;http://developer.management.theclient.rdev.com/?m=testimonial/details&amp;amp;id=1&lt;/A&gt;" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0"&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2011 03:00:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39227#M7276</guid>
      <dc:creator>phoenixdigital</dc:creator>
      <dc:date>2011-07-06T03:00:28Z</dc:date>
    </item>
    <item>
      <title>Re: Apache logfile with virtualhost added to logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39228#M7277</link>
      <description>&lt;P&gt;Ok I seemed to get it to work eventually using the following&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;inputs.conf&lt;/STRONG&gt;&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
[monitor:///etc/httpd/logs/access_log*]&lt;BR /&gt;
sourcetype = advanced_access_combined&lt;BR /&gt;
index = webserver&lt;BR /&gt;
disabled = false&lt;BR /&gt;
followTail = 0&lt;BR /&gt;
host = devserver.remora.com.au&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;[monitor:///etc/httpd/logs/error_log*]&lt;BR /&gt;
index = webserver&lt;BR /&gt;
disabled = false&lt;BR /&gt;
followTail = 0&lt;BR /&gt;
host = devserver.remora.com.au&lt;BR /&gt;
&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;props.conf&lt;/STRONG&gt;&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
[advanced_access_combined]&lt;BR /&gt;
pulldown_type = true&lt;BR /&gt;
maxDist = 28&lt;BR /&gt;
MAX_TIMESTAMP_LOOKAHEAD = 128&lt;BR /&gt;
REPORT-access = advanced-access-extractions&lt;BR /&gt;
SHOULD_LINEMERGE = False&lt;BR /&gt;
TIME_PREFIX = [&lt;BR /&gt;
&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;transforms.conf&lt;/STRONG&gt;&lt;BR /&gt;
&lt;PRE&gt;&lt;BR /&gt;
[all_lazy]&lt;BR /&gt;
REGEX = .*?&lt;/PRE&gt;&lt;/P&gt;

&lt;P&gt;[all]&lt;BR /&gt;
REGEX = .*&lt;/P&gt;

&lt;P&gt;[nspaces]&lt;BR /&gt;
# matches one or more NON space characters&lt;BR /&gt;
REGEX = S+&lt;/P&gt;

&lt;P&gt;[qstring]&lt;BR /&gt;
#matches a quoted "string" - extracts an unnamed variable - name MUST be provided as in [[qstring:name]]&lt;BR /&gt;
# Extracts: empty-name-group (needs name)&lt;BR /&gt;
REGEX = "(?&amp;lt;&amp;gt;[^"]*+)"&lt;/P&gt;

&lt;P&gt;[sbstring]&lt;BR /&gt;
#matches a string enclosed in [] - extracts an unnamed variable - name MUST be provided as in [[sbstring:name]]&lt;BR /&gt;
# Extracts: empty-name-group (needs name)&lt;BR /&gt;
REGEX = [(?&amp;lt;&amp;gt;[^]]*+)]&lt;/P&gt;

&lt;P&gt;[bc_domain]&lt;BR /&gt;
REGEX = (?&amp;lt;domain&amp;gt;w++://[^/s"]++)&lt;/P&gt;

&lt;P&gt;[bc_uri]&lt;BR /&gt;
# backwards compatible uri regex&lt;BR /&gt;
# uri  = path optionally followed by query [/this/path/file.js?query=part&amp;amp;other=var]&lt;BR /&gt;
# path = root part followed by file        [/root/part/file.part]&lt;BR /&gt;
# Extracts: uri, uri_path, root, file, uri_query, uri_domain (optional if in proxy mode)&lt;BR /&gt;
REGEX = (?&amp;lt;uri&amp;gt;[[bc_domain:uri_]]?+(?&amp;lt;uri_path&amp;gt;[[uri_root]]?[[uri_seg]]&lt;EM&gt;(?&amp;lt;file&amp;gt;[^s?/]+)?)(?:?(?&amp;lt;uri_query&amp;gt;[^s]&lt;/EM&gt;))?)&lt;/P&gt;

&lt;P&gt;[reqstr]&lt;BR /&gt;
REGEX = [^s"]++&lt;/P&gt;

&lt;P&gt;[access-request]&lt;BR /&gt;
# very relaxed regex for extracting fields from the request&lt;BR /&gt;
REGEX = "s*+[[reqstr:method]]?(?:s++&lt;A href="https://community.splunk.com/?:s++%5B%5Breqstr:version%5D%5D" target="_blank"&gt;[bc_uri]&lt;/A&gt;&lt;EM&gt;)?s&lt;/EM&gt;+"&lt;/P&gt;

&lt;P&gt;[advanced-access-extractions]&lt;BR /&gt;
REGEX = ^[[nspaces:vhost]]s++[[nspaces:clientip]]s++[[nspaces:ident]]s++[[nspaces:user]]s++[[sbstring:req_time]]s++[[access-request]]s++[[nspaces:status]]s++[[nspaces:bytes]]s++&lt;A href="?:s++&amp;quot;(?&amp;lt;referer&amp;gt;%5B%5Bbc_domain:referer_%5D%5D?+%5B%5E&amp;quot;%5D*+)&amp;quot;(?:s++%5B%5Bqstring:useragent%5D%5D(?:s++%5B%5Bqstring:cookie%5D%5D)?+)?+" target="_blank"&gt;[nspaces:req_process_time]&lt;/A&gt;?[[all:other]]&lt;BR /&gt;
&lt;/P&gt;

&lt;P&gt;It seemed I needed to copy alot of extras from the &lt;STRONG&gt;/opt/splunk/etc/system/default/transforms.conf&lt;/STRONG&gt; which makes sense.&lt;/P&gt;

&lt;P&gt;Another issue I encountered was that I have a primary index server and the apache files are being forwarded using a 'Universal Forwarder' &lt;/P&gt;

&lt;P&gt;The whole thing did not work when props.conf and tranforms.conf were on the 'Universal Forwarder'. I needed to add them to the indexing server for the logfiles to be parsed correctly.&lt;/P&gt;

&lt;P&gt;This is potentially going to be an issue as I would like to get the virtual host in the logfile to be marked as the Splunk host. The host is currently defined on the 'Universal Forwarder' in inputs.conf however I dont extract the virtual host until it hits the transforms.conf on the indexing server. I think by that time it will be too late to set the Splunk host. Anyway I will create a new question for that as it is out of the scope of this one.&lt;/P&gt;

&lt;P&gt;Edit: The formatting rules here are useless when pasting in conf files so they are a bit munted. If someone needs the configs message me (if thats possible with splunkbase)&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:43:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39228#M7277</guid>
      <dc:creator>phoenixdigital</dc:creator>
      <dc:date>2020-09-28T09:43:14Z</dc:date>
    </item>
    <item>
      <title>Re: Apache logfile with virtualhost added to logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39229#M7278</link>
      <description>&lt;P&gt;Universal Forwarder does not execute any parsing.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Typesofforwarders"&gt;http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Typesofforwarders&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Feb 2012 16:58:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39229#M7278</guid>
      <dc:creator>oscarspaz</dc:creator>
      <dc:date>2012-02-29T16:58:45Z</dc:date>
    </item>
    <item>
      <title>Re: Apache logfile with virtualhost added to logs</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39230#M7279</link>
      <description>&lt;P&gt;Did answers remove the leading slash on your &lt;CODE&gt;\s++&lt;/CODE&gt; because it's only showing 's++'?&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2016 13:03:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Apache-logfile-with-virtualhost-added-to-logs/m-p/39230#M7279</guid>
      <dc:creator>sloshburch</dc:creator>
      <dc:date>2016-08-25T13:03:22Z</dc:date>
    </item>
  </channel>
</rss>

