Splunk Search
Highlighted

Why is my props.conf and transforms.conf configuration not extracting fields from access_combined logs with a vhost?

Communicator

Hi

I have a Problem with my Access_combined which has a vhost at the beginning like this:

www.domain.com:80 10.60.50.40 - - [04/Nov/2015:11:14:26 +0100] "GET /path/to/file/custom/flexslider.css HTTP/1.1" 200 1663 "http://www.domain.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

When I index it, it doesn't get the fields from Access_combined.
I already tried to create a new transforms.conf and props.conf.

I'm indexing those logs with sourcetype=webserveraccesscombined

Props.conf

[webserver_access_combined]
pulldown_type = true 
maxDist = 28
MAX_TIMESTAMP_LOOKAHEAD = 128
REPORT-access = vhost-access-extractions
SHOULD_LINEMERGE = False
TIME_PREFIX = \[
category = Web
description = National Center for Supercomputing Applications (NCSA) combined format HTTP web server logs (can be generated by apache or other web servers)

Transforms.conf

[vhost-access-extractions]
# matches access-common or access-combined apache logging formats
# Extracts: clientip, clientport, ident, user, req_time, method, uri, root, file, uri_domain, uri_query, version, status, bytes, referer_url, referer_domain, referer_proto, useragent, cookie, other (remaining chars)  
# Note: referer is misspelled in purpose because that is the "official" spelling for "HTTP referer" 
REGEX = ^[[nspaces:vhost]]\s++[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?<referer>[[bc_domain:referer_]]?+[^"]*+)"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]

I have those configurations on my indexer Servers. And I also see the logs with the correct sourcetype, but it doesn't work.

Does somebody have an idea why it doesn't work?

Thanks!

0 Karma
Highlighted

Re: Why is my props.conf and transforms.conf configuration not extracting fields from access_combined logs with a vhost?

Contributor

Did you build out your extractions and confirm them in something like regex101? I copied your example log and your extractions and it did not match. I started a bit and for the first few fields it would look more like this: \n(?\S+):(?\d+)\s(?\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s

Also you'll want your extractions to take place at search-time in your props.conf like this:

EXTRACT-blah = \n(?<vhost>\S+):(?<clientport>\d+)\s(?<clientip>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s
0 Karma
Highlighted

Re: Why is my props.conf and transforms.conf configuration not extracting fields from access_combined logs with a vhost?

Communicator

I just used the the original which was in the transforms.conf like this:

REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[[nspaces:bytes]](?:\s++"(?<referer>[[bc_domain:referer_]]?+[^"]*+)"(?:\s++[[qstring:useragent]](?:\s++[[qstring:cookie]])?+)?+)?[[all:other]]

and tried to change this one... so this isn't the correct way?

0 Karma
Highlighted

Re: Why is my props.conf and transforms.conf configuration not extracting fields from access_combined logs with a vhost?

Contributor

based on what I"m seeing that won't work. to see if your regex works do something like this:

Your Search | rex "^(?<vhost>\S+)\s+(?<clientip>\S+)\s++(?<ident>\S+)\s+(?<user>\S+)\s+\[(?<req_time>[^\]]+)\]\s+"(?<access_request>[^"]+)"\s+(?<status>\S+)\s+(?<bytes>\S+)\s+"(?<referrer>[^"]+)"\s+"(?<user_agent>[^"]+)""
0 Karma
Highlighted

Re: Why is my props.conf and transforms.conf configuration not extracting fields from access_combined logs with a vhost?

Esteemed Legend

Your REGEX is crazy; try this one:

REGEX=^(?<vhost>\S+)\s+(?<clientip>\S+)\s++(?<ident>\S+)\s+(?<user>\S+)\s+\[(?<req_time>[^\]]+)\]\s+"(?<access_request>[^"]+)"\s+(?<status>\S+)\s+(?<bytes>\S+)\s+"(?<referrer>[^"]+)"\s+"(?<user_agent>[^"]+)"
0 Karma