I'm trying to make sense of the default access-extractions transform so that I can modify it a bit. I've been nosing around splunk answers and the online Admin Manual. In particular, the "Use the Field transformations page in Manager" page.
Abbreviated version of the default access-extractions regex:
I see that nspaces is another transform, though I'm not sure what the :clientip means, for example. Basically I want to prepend some fields to the expression.
My syslog-ng log output is the same as the common apache access log, but with a few more fields at the start of each log line. When I simply clone the access-extractions transform, make no modifications except for changing the Name field, it kicks back "Please enter all required fields" indicating in red that Event Format is the required field. When I look at the default access-extractions transform (or any others) the Event Format field is empty, so it doesn't give me much to go on. Would those names (:client, :ident, :user, etc) be an indication that I need to do something like clientip::$1 ident::$2 user::$3 etc...?
Thanks in advance!
You mentioned the message "Please enter all required fields". Are you trying to edit these regexes from UI? If so, I'm guessing that these modular regular expressions will fail because the UI doesn't understand them. With regexes this complicated, I would edit the
transforms.conf file directly.
I've built upon these before. Like gkanapathy said, there is some docs in the
system/default/transforms.conf file itself, and yeah, it's rather ugly.
That said, with a little patience, it's not too bad to figure out what going on.
Basically these take the form of
[[<transfom_stanza_name>:<field_name>]] So in the example you've asked about
[[nspaces:clientip]] means use the
nspaces transformer (which simply means no spaces, pretty simple) and extract the field as the name
clientip. (You may also notice that some of these take field names, and other have the field names build into the transformers themselves.)
Also, the "
\s++" seemed really weird to me at first. But as it turns out this is just a normal PCRE-supported regular expression syntax (but not all regex engines support it). This simply means that no backtracking can be done after it matches. (I think may also be called (or related to atomic grouping.. idunno). For most purposes, think of this as a slightly faster "
\s+", but I don't recommend that you start using it yourself, unless you read up on it. (I've gotten bit by this a few times.)
The other thing I struggled with was the fact that the default
access-extractions contained so many helpful field extractions already, I really didn't want to try to re-write all that into my own "regular style" regex. Writing a regex that complicated by hand can be pretty daunting.
Take a look at the
bc_uri transformer. Who wants recreate that beast?
REGEX = (?<uri>[[bc_domain:uri_]]?+(?<uri_path>[[uri_root]]?[[uri_seg]]*(?<file>[^\s\?/]+)?)(?:\?(?<uri_query>[^\s]*))?)
REGEX = (?<uri>(?<domain>\w++://[^/\s"]++)?+(?<uri_path>/++(?<root>(?:\\"|[^\s\?/"])++)/++?(?:\\"|[^\s\?/"])*+/++*(?<file>[^\s\?/]+)?)(?:\?(?<uri_query>[^\s]*))?)
Keep in mind that the
bc_url is just one portion of the entire
access-request transformer, which is part of the even larger
access-extractions transfomermer. All of which leads me to belive, you can make your own by building on top of the
access-extractions, like so:
[my-custom-access-extractions] REGEX = [[access-extractions]]\s++[[nspace:my_trailing_field]]
Doh, just realize this may not work for you. The
access-extractions transformer starts with a
^ (start of line). However, you should still be able to just copy the entire REGEX and stick your extra fields after
^ and before the
[[nspaces:clientip]]. Seems like it's worth a try.
If you post some examples of your modified format, I'm guessing that someone will help you out with getting a working regex (modular regex or otherwise)...
No. These terms are modular regexes that refer to other regular expressions defined in
transforms.conf. There is no documentation on them other than the
etc/system/default/transforms.conf file itself. There is also a tool in the Splunk
bin/pcregextest that lets you use and test them. I would recommend that most people simply avoid using or looking at these or trying to do anything with them, and simply use plain PCRE regex.