I have logs which contain a long series of pipe delimited fields.
My issue is that there are some fields which do not have any values, and instead of some character being loaded in place of a NULL field, the field is left blank.
For example this log would record various information about site visitors, some fields are left blank based on the device and parts of the website visited.
|wired|||||/||00005wcuu-jSbW_AypQB1ZDLdjH:180ds1m45|Search|||Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36|
But when I extrapolate the regex to meet this scenario, I still receive "the generated regex was unable to match all examples"
Here is a regex created in attempt to generate fields:
^(?P<FIELDNAME1>[^\|]+)\|(?P<FIELDNAME2>[^\|]+)\|(?P<FIELDNAME3>[^\|]+)\|(?P<FIELDNAME4>[^\|]+)\|(?P<FIELDNAME5>[^\|]+)\|(?P<FIELDNAME6>[^\|]+)\|(?P<FIELDNAME7>[^\|]+)\|(?P<FIELDNAME8>[^\|]+)\|(?P<FIELDNAME9>[^\|]+)\|(?P<FIELDNAME10>[^\|]+)\|(?P<FIELDNAME11>[^\|]+)\|(?P<FIELDNAME12>[^\|]+)
None of the log events will contain Pipes within the fields, so I thought that it would be simple enough to tell Splunk that anything (even nothing) between two pipes is a field.
Any suggestions are greatly appreciated!
One problem lies in the fact that you use the one-or-more quantifier (the plus sign) for your non-pipe character classes. Use the zero-or-more quantifier (the asterisk) or perhaps better still - don't use regex at all.
Have you looked into REPORT instead of EXTRACT? This allows you to make your extractions by specifying a delimiter, e.g.
props.conf
[your_sourcetype]
REPORT-www = extract_weblog_fields
transforms.conf
[extract_weblog_fields]
DELIMS = "|"
FIELDS = field1, field2, field3, field4
Just name the fields appropriately. Read more on REPORT and FIELD EXTRACTION in the docs.
/K
Can I do this when I don't have Splunk installed locally, and just accessing it through the browser?
I definitely need to do more reading on this, because I would suppose that I could create config files and submit them to my administrator?
By using '+' in your field descriptions you're telling regex there must be at least one character between pipes. Try using '*'.
Thank you! This definitely caught the majority of the delimited fields. Now I just need to make sure each event in the logs is in the same exact format.
I'm getting some little errors, where one pair of pipes isn't being caught properly...and the user agent string in the last field is cut off after the first few letters.
But most importantly I'm not getting the same problems as before...which is progress!