I am attempting to perform a search time field extraction via the rex command. I use the default field of _raw and give it a regex with named groups. None of my named groups are showing up as an available field to select from.
Essentially, I am parsing a custom apache access log:
An example of a line of data is:
9.999.999.999 9.999.999.9 xxxxxxxx [17/Jun/2014:23:11:43 -0400] "GET /someapp/css/windows/default.css HTTP/1.1" 200 767 "protocol://www.ourserver.com/someapp/some.jsp?param=1¶m2=a" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)"
The search I use is:
source=/issue.log| rex "(?:[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+, )?(?<forwardedforip>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|\-) (?<remoteip>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+) (?<userid>\S+|\-)[ ]+\[(?<day>\d+)/(?<month>\w+)/(?<year>\d+):(?<hour>\d+):(?<minute>\d+):(?<second>\d+) (<?timezone>-\d+)] \"(?<action>\w+) (?<url>.*?)(?<parameters>\?.*?)? (?<httpversion>\S+)\" (?<httpstatus>\d+) (?<responsesize>\d+|\-) \"(?<refererurl>.*?)\" \"(?<useragent>.*?)\""
Any ideas why my named groups are not showing up? This regex works without the named groups in regex testing apps. I just cannot get it to be recognized by Splunk.
thanks!
Try this
I believe you just misplaced one '?' for the timezone field extraction. Remaining thing works.
source=/issue.log | rex "(?:[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+, )?(?<forwardedforip>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|\-) (?<remoteip>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+) (?<userid>\S+|\-)[ ]+\[(?<day>\d+)/(?<month>\w+)/(?<year>\d+):(?<hour>\d+):(?<minute>\d+):(?<second>\d+) (?<timezone>-\d+)] \"(?<action>\w+) (?<url>.*?)(?<parameters>\?.*?)? (?<httpversion>\S+)\" (?<httpstatus>\d+) (?<responsesize>\d+|\-) \"(?<refererurl>.*?)\" \"(?<useragent>.*?)\""
Try this
I believe you just misplaced one '?' for the timezone field extraction. Remaining thing works.
source=/issue.log | rex "(?:[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+, )?(?<forwardedforip>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|\-) (?<remoteip>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+) (?<userid>\S+|\-)[ ]+\[(?<day>\d+)/(?<month>\w+)/(?<year>\d+):(?<hour>\d+):(?<minute>\d+):(?<second>\d+) (?<timezone>-\d+)] \"(?<action>\w+) (?<url>.*?)(?<parameters>\?.*?)? (?<httpversion>\S+)\" (?<httpstatus>\d+) (?<responsesize>\d+|\-) \"(?<refererurl>.*?)\" \"(?<useragent>.*?)\""
That did it. You know, I looked at this over and over thinking it was something like this and kept missing it.
Thank you!
However, if I look at a specific field, Apache_Request, it works!
source=/issue.log| rex field="Apache_Request" "(?<action>\w+) (?<url>.*?)(?<parameters>\?.*?)? (?<httpversion>\S+)"