Splunk Enterprise

Regex

praddasg
Path Finder

Hello All,

 

I am not so familiar with regex, but looking at some old query have been able to build one for my need. I am looking for help to understand how this is working in terms of regular expression and Splunk rex syntax

So the regex I am using is 

 

| rex field=_raw message="(?<message>.*).request"

 

 for the

 

message=abc ff request-id

 

where I am trying to extract anything after "=" until "request-id". There could be spaces as well

  1. I think "<message>" here is the field name I want to denote
  2. The wild card character "*" within the braces indicate everything after "message="
  3. But I don't understand
    1. The use of "?". Is this part of the syntax of splunk regex or signifying anything and everything after "message=" i.e. working along with "*"
    2. What is the use of braces here? is this indicating the section I am trying to parse?
    3. The dot "." after "<message>". Is this splunk syntax?
    4. The dot "." after braces. Is this denoting/delimiting/indicating the string which is present after the parsing section
    5. The most confusing part is the use of quotes.
    6. What would be regex if it is like "message abc ff request-id" and I want to parse anything between message and request
Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

To know whether a regex works or not requires knowing the text it is trying to match.  Please share.

I don't know which regex version Sublime uses, but Splunk uses PCRE.  Also, Splunk is not a text editor so it may behave differently from an editor.

You may find this helpful: https://conf.splunk.com/files/2017/slides/regex-in-your-spl.pdf

 

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

praddasg
Path Finder

@richgalloway  the text trying to match here is anything after "=", until "request" so the complete text here is 

message=abc ef x request-id
0 Karma

praddasg
Path Finder

oh one more thing, the content between "=" and "request" could be any number of character or number and can have multiple spaces as well

0 Karma

richgalloway
SplunkTrust
SplunkTrust
"=.*?request"

The question mark limits the scope of the asterisk to the fewest number of characters needed to match the regex.

---
If this reply helps you, an upvote would be appreciated.

richgalloway
SplunkTrust
SplunkTrust

The example rex command is invalid.  The regular expression must be enclosed in quotation marks, like this

| rex field=_raw "message="(?<message>.*).request""

then the embedded quotation marks must be escaped, like this

| rex field=_raw "message=\\\"(?<message>.*).request\\\""

1.  <message> denotes the name of the capture group and is the name of the field the matching text will fill.

2.  The regex wildcard character is . (full stop).  The asterisk (*) is a quantifier that means "any number of these".  The sequence .* ("dot-star") means "everything from here on".

3a. The "?" means nothing by itself in this context.  The "(?" sequence starts a capture group.

b. There are no braces in this regex so do you mean the parentheses or the angle brackets?  In this context, the parentheses denote a capture group and the angle brackets denote the name of the current capture group.  In Splunk, this becomes a field name.

c. Like mentioned in 2 above, the dot is the wildcard character.  Is it standard regex, not specific to Splunk.

d. See 2 and c.

e.  Quotation marks are not special characters in regex.  They're just another character to match.  On the other hand, embedded quotation marks in the rex command ARE confusing.  They require 3 escape characters to get through the various parsers to the regex engine.

f. Something like "message (.*) request".

If you pass a regex string into https://regex101.com the site will explain what each character means.

---
If this reply helps you, an upvote would be appreciated.

praddasg
Path Finder

Hello @richgalloway 

Thank you for taking the time and explaining. I really appreciate the time you vested in explaining this.

  1. Interestingly this one works 

     

 

| rex field=_raw message="(?<message>.*).request"

 

So does the 

 

| rex field=_raw "message="(?<message>.*).request""

 

but not the 

 

"message=\\\"(?<message>.*).request\\\""

 


when I say work, I mean it is giving the desired result and by not working I mean not giving the desired result. Although in none of the cases there wasn't any syntax error.

The one with the escaped quotation mark only gives the result until before the spaces i.e. if it is "message=abc efg request-id", it only prints "abc". Does this have anything to do with the Splunk version?

2. Regarding The sequence .* ("dot-star") means "everything from here on"  - I am assuming this regex and nothing to do with Splunk itself. So I tried to use this concept in a sublime text editor to see what happens. I used 

 

message=Error translating Grubhub webhook order: The location for this order cannot be found request-id

 

and tried to replace message=.* with let's say new. I found the entire thing got wiped out and replaced with new. I was expecting something like message=new. I even tried message="(?.*).request", "message="(?.*).request"", but no changes happened. Is it because Splunk uses some different regex logic than sublime text editor?

3. I am still confused about the use of quotation mark, I tried using the website which you mentioned, but it confused me more lol. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

To know whether a regex works or not requires knowing the text it is trying to match.  Please share.

I don't know which regex version Sublime uses, but Splunk uses PCRE.  Also, Splunk is not a text editor so it may behave differently from an editor.

You may find this helpful: https://conf.splunk.com/files/2017/slides/regex-in-your-spl.pdf

 

---
If this reply helps you, an upvote would be appreciated.

View solution in original post