Splunk Search

best approach to define transforms with regular expressions?

peterweinstein
Explorer

Hi,

I would appreciate some orientation on the best way to use regular expressions to define transforms. I guess my basic confusion is that regular expressions are usually defined as recognizers, thus F(str)->(true, false) if there is a match or not. However, to define transforms we are using regular expressions as transformational functions, e.g. F(str)->(another string).

I have come across several possible approaches as to how to use regexes as transformational functions:

-- in the Search manual, the rex command has a "sed" mode that uses special Perl syntax to transform strings.

-- regular-expressions.info describes features such as "backreferences" and "lookarounds" which seem to optionally "capture" or "keep" values ... the language suggests functional uses but unfortunately the writing is not precise.

-- the Splunk web interface and examples in the manuals seem to use features from Perl Compatible Regular Expressions (e.g. (?[FIELDNAME>) to extract substrings.

What approach do you prefer? I am happy to learn another grammar, but would prefer to learn one in particular!

Thanks for your help!

Peter

Tags (2)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

Splunk uses PCRE compatible regex.

For instance, to change the index to which data is written, from here in the docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Setupmultipleindexes

Route specific events to a different index 

Edit props.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/props.conf:

[windows_snare_syslog]
TRANSFORMS-index = AppRedirect

This directs events of windows_snare_syslog sourcetype to the AppRedirect stanza in transforms.conf.
[edit] Edit transforms.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:

[AppRedirect]
REGEX = MSWinEventLog\s+\d+\s+Application
DEST_KEY = _MetaData:Index
FORMAT = applogindex

This stanza processes the events directed here by props.conf. Events that match the regex, by containing the string "Application" in the specified location, get routed to the alternate index, "applogindex". All other events route to the default index. 

In this case, the transformation piece is FORMAT, which says what you'd like rewritten.

Another good example is in the doc's on how to anonymize data data:

http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles

View solution in original post

cvajs
Contributor

i think it is common approach to use regex(?regex)regex to search/define things.

back-references are rather straightforward, see http://www.regular-expressions.info/brackets.html

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

Splunk uses PCRE compatible regex.

For instance, to change the index to which data is written, from here in the docs:

http://docs.splunk.com/Documentation/Splunk/latest/admin/Setupmultipleindexes

Route specific events to a different index 

Edit props.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/props.conf:

[windows_snare_syslog]
TRANSFORMS-index = AppRedirect

This directs events of windows_snare_syslog sourcetype to the AppRedirect stanza in transforms.conf.
[edit] Edit transforms.conf

Add this stanza to $SPLUNK_HOME/etc/system/local/transforms.conf:

[AppRedirect]
REGEX = MSWinEventLog\s+\d+\s+Application
DEST_KEY = _MetaData:Index
FORMAT = applogindex

This stanza processes the events directed here by props.conf. Events that match the regex, by containing the string "Application" in the specified location, get routed to the alternate index, "applogindex". All other events route to the default index. 

In this case, the transformation piece is FORMAT, which says what you'd like rewritten.

Another good example is in the doc's on how to anonymize data data:

http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles

peterweinstein
Explorer

This answer was helpful, thanks; but didn't really answer my question which I will do here:

A precise but long description of PCRE regex is at http://www.pcre.org/pcre.txt.

To understand regular expressions as transformation functions, the output of a regular expression is:
- the input string
- minus substrings matched by the regex
- substituted with output from a FORMAT argument, and substrings 'captured' in the regex referenced either with $n or the names of capture subgroups.

The tool at http://gskinner.com/RegExr/ and the doc above helped me figure this out. I hope I'm right!

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...