Hi,
I would appreciate some orientation on the best way to use regular expressions to define transforms. I guess my basic confusion is that regular expressions are usually defined as recognizers, thus F(str)->(true, false) if there is a match or not. However, to define transforms we are using regular expressions as transformational functions, e.g. F(str)->(another string).
I have come across several possible approaches as to how to use regexes as transformational functions:
-- in the Search manual, the rex command has a "sed" mode that uses special Perl syntax to transform strings.
-- regular-expressions.info describes features such as "backreferences" and "lookarounds" which seem to optionally "capture" or "keep" values ... the language suggests functional uses but unfortunately the writing is not precise.
-- the Splunk web interface and examples in the manuals seem to use features from Perl Compatible Regular Expressions (e.g. (?[FIELDNAME>) to extract substrings.
What approach do you prefer? I am happy to learn another grammar, but would prefer to learn one in particular!
Thanks for your help!
Peter
Splunk uses PCRE compatible regex.
For instance, to change the index to which data is written, from here in the docs:
http://docs.splunk.com/Documentation/Splunk/latest/admin/Setupmultipleindexes
Route specific events to a different index
Edit props.conf
Add this stanza to $SPLUNK_HOME/etc/system/local/props.conf:
[windows_snare_syslog]
TRANSFORMS-index = AppRedirect
This directs events of windows_snare_syslog sourcetype to the AppRedirect stanza in transforms.conf.
[edit] Edit transforms.conf
Add this stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:
[AppRedirect]
REGEX = MSWinEventLog\s+\d+\s+Application
DEST_KEY = _MetaData:Index
FORMAT = applogindex
This stanza processes the events directed here by props.conf. Events that match the regex, by containing the string "Application" in the specified location, get routed to the alternate index, "applogindex". All other events route to the default index.
In this case, the transformation piece is FORMAT, which says what you'd like rewritten.
Another good example is in the doc's on how to anonymize data data:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles
i think it is common approach to use regex(?
back-references are rather straightforward, see http://www.regular-expressions.info/brackets.html
Splunk uses PCRE compatible regex.
For instance, to change the index to which data is written, from here in the docs:
http://docs.splunk.com/Documentation/Splunk/latest/admin/Setupmultipleindexes
Route specific events to a different index
Edit props.conf
Add this stanza to $SPLUNK_HOME/etc/system/local/props.conf:
[windows_snare_syslog]
TRANSFORMS-index = AppRedirect
This directs events of windows_snare_syslog sourcetype to the AppRedirect stanza in transforms.conf.
[edit] Edit transforms.conf
Add this stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:
[AppRedirect]
REGEX = MSWinEventLog\s+\d+\s+Application
DEST_KEY = _MetaData:Index
FORMAT = applogindex
This stanza processes the events directed here by props.conf. Events that match the regex, by containing the string "Application" in the specified location, get routed to the alternate index, "applogindex". All other events route to the default index.
In this case, the transformation piece is FORMAT, which says what you'd like rewritten.
Another good example is in the doc's on how to anonymize data data:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles
This answer was helpful, thanks; but didn't really answer my question which I will do here:
A precise but long description of PCRE regex is at http://www.pcre.org/pcre.txt.
To understand regular expressions as transformation functions, the output of a regular expression is:
- the input string
- minus substrings matched by the regex
- substituted with output from a FORMAT argument, and substrings 'captured' in the regex referenced either with $n or the names of capture subgroups.
The tool at http://gskinner.com/RegExr/ and the doc above helped me figure this out. I hope I'm right!