Splunk Enterprise

How many escapes "\" do I need in .conf file regex?

marycordova
SplunkTrust
SplunkTrust

For example, in props.conf TIME_PREFIX requires a regex.  My regex seems to work in my search but does not seem to be applied to my data via the .conf file

@marycordova
Labels (3)
1 Solution

xpac
SplunkTrust
SplunkTrust

Just to make sure, because this is likely the most regularly confused topic in Splunk when using regexes.

First, create a clean regex in regex101.com - that means, no unnecessary escapes.

What is an unnecessary escape backslash? Well, if you remove it, and your regex still works, and the explanation on the right for that part didn't change - it was unnecessary.

Example 1: \" can be used in regex, but the backslash is unneeded. The quote does not have any special meaning in regex, so " has exactly the same effect.
Example 2: If you wanna match a literal asterisk, it has to be escaped \* - because the asterisk has a special meaning in regex.

Now, when you have your clean regex - just use it as it is in any .conf file. It will work.

However - the | rex and | regex command is different (well, anything in SPL with regex is).

Why? The SPL parser also knows characters with special meaning (e.g. quotes). However, it uses the same escape character as regex - the backslash.

Now, to avoid strange behaviour when using regexes in your SPL, you need to escape them again.

Example 1: You want to match Domain\user in your event. The regex would be Domain\\user. In SPL this would have to be Domain\\\\user - every backslash in the regex needs it's own escape backslash.
Example 2: You want to match "Domain\user" - the regex would simply be "Domain\\user" - quotes have no special meaning in regex. However, in SPL, this would have to be \"Domain\\\\user\" - for the reasons above, and because the quotes have a special meaning.

Addendum: When you use the last regex in SPL in the rex command, it gets put into quotes - like | rex "\"Domain\\\\User\"". Crazy, right?

Purple is code/literal text/commands.
Green is regex escapes.
Red is escapes + quotes required by SPL

PS: I know that SPL sometimes works even without the proper amount of escape backslashes - but sometimes it doesn't. I still haven't found out why. If you have the Splunk source code, send me a mail 😉

PPS: As everything in Splunk, there's likely that one setting in that one .conf file where this does not apply, because $consistency. If I were to bet, I'd bet on something related to Windows/Powershell 😈

View solution in original post

marycordova
SplunkTrust
SplunkTrust

Thank you @xpac for the correct answer:

In the .conf files, regex does not need to be un-necessarily escaped.

For example you could write a regex like this, and it would work in SPL search, but that's because it is embedded within existing quotes ("") and gets passed through the SPL parser:

\"time\":\s+\"

In the .conf files you do not need the escapes for the quotes:

"time":\s+"

You do need to escape backslashes themselves however: \\ = \ and \\\\ = \\ for example in Windows events

Rule of thumb, check your regex in regex101.com and remove any unnecessary escapes 🙂

@marycordova
0 Karma

xpac
SplunkTrust
SplunkTrust

Just to make sure, because this is likely the most regularly confused topic in Splunk when using regexes.

First, create a clean regex in regex101.com - that means, no unnecessary escapes.

What is an unnecessary escape backslash? Well, if you remove it, and your regex still works, and the explanation on the right for that part didn't change - it was unnecessary.

Example 1: \" can be used in regex, but the backslash is unneeded. The quote does not have any special meaning in regex, so " has exactly the same effect.
Example 2: If you wanna match a literal asterisk, it has to be escaped \* - because the asterisk has a special meaning in regex.

Now, when you have your clean regex - just use it as it is in any .conf file. It will work.

However - the | rex and | regex command is different (well, anything in SPL with regex is).

Why? The SPL parser also knows characters with special meaning (e.g. quotes). However, it uses the same escape character as regex - the backslash.

Now, to avoid strange behaviour when using regexes in your SPL, you need to escape them again.

Example 1: You want to match Domain\user in your event. The regex would be Domain\\user. In SPL this would have to be Domain\\\\user - every backslash in the regex needs it's own escape backslash.
Example 2: You want to match "Domain\user" - the regex would simply be "Domain\\user" - quotes have no special meaning in regex. However, in SPL, this would have to be \"Domain\\\\user\" - for the reasons above, and because the quotes have a special meaning.

Addendum: When you use the last regex in SPL in the rex command, it gets put into quotes - like | rex "\"Domain\\\\User\"". Crazy, right?

Purple is code/literal text/commands.
Green is regex escapes.
Red is escapes + quotes required by SPL

PS: I know that SPL sometimes works even without the proper amount of escape backslashes - but sometimes it doesn't. I still haven't found out why. If you have the Splunk source code, send me a mail 😉

PPS: As everything in Splunk, there's likely that one setting in that one .conf file where this does not apply, because $consistency. If I were to bet, I'd bet on something related to Windows/Powershell 😈

emottola
Explorer

Simply echoing xpac's excellent solution with my own, since I've encountered this as well, but perhaps another wording will help future readers as well.

 

When you write regex in a |rex command, backslashes must be used carefully, because there are multiple levels of escaping.

The first is SPL level escaping, because the rex command accepts the argument for the regular expression in quotes,
which means you must escape the " character with \ so your regex string can include it. Call this the SPL parsing step.
For this to work, \ must also be a special character for the SPL parser.
All this only applies to SPL commands like |rex and |regex that accept the regular expression as a string bounded by quotes.
For these though, all " and \ characters must be escaped so they appear as literal in the true regular expression
For the regular expression itself, " is not a special character, but \ is.
So if you need to look for something like a literal \ in the message, your regex must specify \\, and your SPL must specify \\\\.
SPL escapes this once to \\, which regex treats as a literal \.
This is very annoying for cases where your message is escaped json formatting, as the string \" appears in the message itself.
If you need to search for this in a |rex command, you'll need \\\\\".
 
For other methods of regex written in Splunk .conf files like field extractions, transforms, LINE_BREAKER, timestamp lookahead, etc., the regular expression syntax does not need to be escaped again because there's no SPL parser interpreting it first.

Example:
message you're searching:
{\"key\": \"value\"}

regex you want applied (and what should work in .conf files):
{\\"key\\": \\"value\\"}

how you must write the SPL command:
| rex field=_raw "{\\\\\"key\\\\\": \\\\\"value\\\\\"}"
Get Updates on the Splunk Community!

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...

Splunk APM: New Product Features + Community Office Hours Recap!

Howdy Splunk Community! Over the past few months, we’ve had a lot going on in the world of Splunk Application ...

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...