Splunk Search

best approach to define transforms with regular expressions?

peterweinstein
Explorer

Hi,

I would appreciate some orientation on the best way to use regular expressions to define transforms. I guess my basic confusion is that regular expressions are usually defined as recognizers, thus F(str)->(true, false) if there is a match or not. However, to define transforms we are using regular expressions as transformational functions, e.g. F(str)->(another string).

I have come across several possible approaches as to how to use regexes as transformational functions:

-- in the Search manual, the rex command has a "sed" mode that uses special Perl syntax to transform strings.

-- regular-expressions.info describes features such as "backreferences" and "lookarounds" which seem to optionally "capture" or "keep" values ... the language suggests functional uses but unfortunately the writing is not precise.

-- the Splunk web interface and examples in the manuals seem to use features from Perl Compatible Regular Expressions (e.g. (?[FIELDNAME>) to extract substrings.

What approach do you prefer? I am happy to learn another grammar, but would prefer to learn one in particular!

Thanks for your help!

Peter

Tags (2)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

Splunk uses PCRE compatible regex.

For instance, to change the index to which data is written, from here in the docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Setupmultipleindexes

Route specific events to a different index 

Edit props.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/props.conf:

[windows_snare_syslog]
TRANSFORMS-index = AppRedirect

This directs events of windows_snare_syslog sourcetype to the AppRedirect stanza in transforms.conf.
[edit] Edit transforms.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:

[AppRedirect]
REGEX = MSWinEventLog\s+\d+\s+Application
DEST_KEY = _MetaData:Index
FORMAT = applogindex

This stanza processes the events directed here by props.conf. Events that match the regex, by containing the string "Application" in the specified location, get routed to the alternate index, "applogindex". All other events route to the default index. 

In this case, the transformation piece is FORMAT, which says what you'd like rewritten.

Another good example is in the doc's on how to anonymize data data:

http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles

View solution in original post

cvajs
Contributor

i think it is common approach to use regex(?regex)regex to search/define things.

back-references are rather straightforward, see http://www.regular-expressions.info/brackets.html

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

Splunk uses PCRE compatible regex.

For instance, to change the index to which data is written, from here in the docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Setupmultipleindexes

Route specific events to a different index 

Edit props.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/props.conf:

[windows_snare_syslog]
TRANSFORMS-index = AppRedirect

This directs events of windows_snare_syslog sourcetype to the AppRedirect stanza in transforms.conf.
[edit] Edit transforms.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:

[AppRedirect]
REGEX = MSWinEventLog\s+\d+\s+Application
DEST_KEY = _MetaData:Index
FORMAT = applogindex

This stanza processes the events directed here by props.conf. Events that match the regex, by containing the string "Application" in the specified location, get routed to the alternate index, "applogindex". All other events route to the default index. 

In this case, the transformation piece is FORMAT, which says what you'd like rewritten.

Another good example is in the doc's on how to anonymize data data:

http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles

peterweinstein
Explorer

This answer was helpful, thanks; but didn't really answer my question which I will do here:

A precise but long description of PCRE regex is at http://www.pcre.org/pcre.txt.

To understand regular expressions as transformation functions, the output of a regular expression is:
- the input string
- minus substrings matched by the regex
- substituted with output from a FORMAT argument, and substrings 'captured' in the regex referenced either with $n or the names of capture subgroups.

The tool at http://gskinner.com/RegExr/ and the doc above helped me figure this out. I hope I'm right!

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...