Starting a new project with Adobe's CQ5...
I'm starting with the access log, as it is straight forward.
I've done field extractions before for another custom log type, worked great. Now, I can't seem to get any of my extractions appear in the Search.
Walkthrough:
^(?P<FIELDNAME>\d+\.\d+\.\d+\.\d+?)
^(?P<FIELDNAME>\d+\.\d+\.\d+\.\d+?)
^(?P<ip_address>\d+\.\d+\.\d+\.\d+?)
For my last project, I simply entered the Extract Fields tool, entered my regex, saved and the data appeared right in the Search.
props.conf for modified extraction
[cq5-access]
EXTRACT-ip_address = ^(?P<ip_address>\d+\.\d+\.\d+\.\d+?)
props.conf with original full extraction
[cq5-access]
EXTRACT-ip_address-username-day-month-year-hour-minute-second-http_type-http_request-http_code-referer-user_agent = ^(?P<ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s.+?\s(?P<username>.+?)\s(?P<day>\d\d)/(?P<month>\w\w\w)/(?P<year>\d\d\d\d):(?P<hour>\d\d):(?P<minute>\d\d):(?P<second>\d\d)\s.+?\s"(?P<http_type>\w+?)\s(?P<http_request>.+?)\sHTTP.+?"\s(?<http_code>\d+?)\s.+?\s"(?P<referer>.+?)"\s"(?P<user_agent>.+?)"
Sample data:
10.71.40.57 - admin 23/Apr/2013:16:15:14 -0400 "GET /crx/server/crx.default/jcr%3aroot/etc/map/http.1.json?_dc=1366748119022&node=xnode-339 HTTP/1.1" 200 175 "https://twcc-ci01.lab.webapps.rr.com:4602/crx/de/index.jsp" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"
10.71.40.57 - admin 23/Apr/2013:16:15:13 -0400 "GET /crx/de/icons/crxde_favicon.ico HTTP/1.1" 200 295606 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"
127.0.0.1 - admin 23/Apr/2013:16:42:31 -0400 "GET /bin/receive?sling:authRequestLogin=1 HTTP/1.1" 200 32 "-" "Jakarta Commons-HttpClient/3.1"
This isn't necessarily related to your problem, but I don't think your regex will give you the expected results. You have a lazy (?) modifier at the end of your regex will should cause the last section of your IP Address to stop at only 1 digit, so if you have an IP that ends with 2 or 3 digits you won't get those. I believe the ip_address extraction in the original full extraction will work better.
Also, I've seen some unexpected results in Splunk when using the start of line character (^) so I try where possible not to use them. Here is a modified regex that removes the ^ (I look for the pattern following that IP in your example data instead) and updates the lazy modifier. Give it a shot...
(?P<ip_address>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s-
this is correct. as an example:
import re
re.findall('^(?P\d+.\d+.\d+.\d+?)', '10.71.40.57 -')
['10.71.40.5']
re.findall('^(?P\d+.\d+.\d+.\d+)', '10.71.40.57 -')
['10.71.40.57']
Is cq5-access
the sourcetype
or a filename you're reading?
I'd try to use underscores instead of dashes in all names (sourcetypes, fields, anything), where possible. There have been issues with these not showing up when names have contained dashes.
http://splunk-base.splunk.com/answers/48611/bug-in-interactive-field-extractor-ifx
/K