I'm troubleshooting a regex to match against the following data (names and IP addresses are fictional):
Aug 26 10:55:50 10.11.12.13 PRIMUS NT: <Security;F529;NT AUTHORITY\SYSTEM> Logon Failure: Reason:Unknown user name or bad password User Name:bob Domain:10.11.12.13 Logon Type:3 Logon Process:NtLmSsp Authentication Package:NTLM Workstation Name:FAUX Caller User Name:- Caller Domain:- Caller Logon ID:- Caller Process ID:- Transited Services:- Source Network Address:10.11.12.14 Source Port:0
I believe the regex below is a valid PCRE, but Splunk's search complains with the message:
Error in 'SearchOperator:regex': The regex '<Security;[A-Z]\d+;[^\]+' is invalid. missing terminating ] for character class | regex _raw="<Security;[A-Z]\d+;[^\\]+"
When I create a regex for a Splunk search, am I supposed to escape each backslash with an additional backslash? The following search appears to work...
| regex _raw="<Security;[A-Z]\\d+;[^\\\\]+"
At the search line, each backslash must be escaped. In the Field Extraction regex, only the literal backslash (matching the "\") needs to be escaped.
| rex field=UserName "(?<field1>[^\\\\]+)\\\\+(?<field2>[^\\\\\\",]+)$"
Additionally, at the search line, double quotes must be escaped. In many cases like the double quote, the search line does not require every backslash to be escaped. The following example takes the previous case and allows for double quotes to be found around the field values without matching them.
| rex field=_raw "UserName=\"?(?<field1>[^\\\\]+)\\\\+(?<field2>[^\\\\\",]+)\"?$"
| rex field=_raw "UserName=\\"?(?<field1>[^\\\\]+)\\\\+(?<field2>[^\\\\\",]+)\\"?$"
That didn't quite answer my question... please closely look at the difference between my search and Splunk's error message. If I copy and paste the regex you provided, Splunk responds with an error.
Error in 'SearchOperator:regex': The regex '<Security;[A-Z]\d+;[^\]+' is invalid. missing terminating ] for character class
It's behaving as though the double-backslash is first unescaped, then applied to the "]" character as a literal. Note how in the error message, it only shows one backslash, but the character class in my query had two.