hello. I'll preference this with I'm not by any means a regex user.
I'm working with a custom Apache format that Splunk 6 is not extracting correctly. I'm just loosely trying to assign each Apache field an identifier so it will populate interesting fields.
LogFormat "%h %{forwarded}e %{host}i %t %D \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
log-example
1.1.1.1 2.2.2.2 host.domain.com [07/Nov/2013:21:59:49 +0000] 88040 "GET /api/v3/projects HTTP/1.1" 200 82 "referer_URL_here_because_ask_site_won't_let_me_post" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:25.0) Gecko/20100101 Firefox/25.0"
I'm assigning the sourcetype on the UF as "access_custom". Here are my props and transforms. Any help would be greatly appreciated.
[access_custom]
TRANSFORM-format=apache_format
[apache_format]
REGEX=(.) (.) (.) [(.)] (.) \"(.)\" ([0-9]) ([0-9]) \"(.)\" \"(.)\"
FORMAT=remotehost::$1 clientip::$2 hostheader::$3 timestamp::$4 req_time::$5 url::$6 statuscode::$7 bytes::$8 referer::$9 user-agent::$10
thxs!
[apache_format]
REGEX=([\d\.]+)\s([\d\.]+)\s([\w\.]+)\s\[(\d+\/\w+\/\d+\:\d+\:\d+\:\d+\s\+\d+)]\s(\d+)\s\"(\w\/\s\.)\"\s(\d+)\s(\d+)\s\"([\w\-]+)\"\s\"([\w\d\.\;\,\/\\\s\:])\\(\)"
FORMAT=remotehost::$1 clientip::$2 hostheader::$3 timestamp::$4 req_time::$5 url::$6 statuscode::$7 bytes::$8 referer::$9 user-agent::$10
[apache_format]
REGEX=([\d\.]+)\s([\d\.]+)\s([\w\.]+)\s\[(\d+\/\w+\/\d+\:\d+\:\d+\:\d+\s\+\d+)]\s(\d+)\s\"(\w\/\s\.)\"\s(\d+)\s(\d+)\s\"([\w\-]+)\"\s\"([\w\d\.\;\,\/\\\s\:])\\(\)"
FORMAT=remotehost::$1 clientip::$2 hostheader::$3 timestamp::$4 req_time::$5 url::$6 statuscode::$7 bytes::$8 referer::$9 user-agent::$10