topic Re: Regex: I want to match a string and then extract the next lines until matching another string in Splunk Search

Regex: I want to match a string and then extract the next lines until matching another string

edrivera3 — Tue, 21 Apr 2015 00:29:09 GMT

I have the following in all my events:

ERROR=40392
"This error ... blah...blah....
... ... .. ... ... .. ... ..... ..
... .. ... ... .. . ..."
END

I would like to extract everything between "ERR0R=40302" and "END" in a field. Also the error number change for each event. I would appreciate your help.

Re: Regex: I want to match a string and then extract the next lines until matching another string

stephane_cyrill — Mon, 28 Sep 2020 19:39:11 GMT

HI edrivera3,
the rex or regex is the best for that.try this to extract for example properties values and put them in one field:

......| rex max_match=0 field=_raw " HERE YOU PUT YOUR REGEX"

If you cannot easily write regex like me, use IFX,do as if you want to extract the values, the IFX will provide the regular expression that can use there.

Re: Regex: I want to match a string and then extract the next lines until matching another string

dflodstrom — Tue, 21 Apr 2015 01:48:46 GMT

For this sample log entry:

ERROR=40392 "This error blah blah" END

It would be possible to use rex inline like (rex defaults to the field _raw unless you specify otherwise):

<your search> | rex "ERROR=\d+\s"(?<new_field>.+)"\sEND"

You will end up with: new_field=This error blah blah

You can put that into props.conf for a search time extraciton:

EXTRACT-your_extract = ERROR=\d+\s"(?<new_field>.+)"\sEND

Re: Regex: I want to match a string and then extract the next lines until matching another string

rsennett_splunk — Tue, 21 Apr 2015 02:30:56 GMT

edrivera3,

First, let me recommend you check out regex101.com because it will show you exactly what your regex is capturing and what it's not. It also explains ever step of your regex. Very helpful for learning.

Since you mention that the error will have "different numbers" I think it's worth pointing out that regex is a pattern matching. So sometimes you will notate literal things like ERROR= and sometimes you will use representations like \d for digit and \d+ for one or more digits. It helps to be precise when you can. So even if the numbers were different, if you always have a five digit error code the regex for just that... would look like this ERROR=\d{5}which translates to literally ERROR= followed by five digits... always. So in this case you represent what you don't want to capture, but you want to make sure is included which is: ERROR=\d+\s+\"

Then this could get tricky: Your sample seems to have carriage returns. so while it might seem like a good idea to use a dot (which represents any character) and say .+ that would only work for one line in the message, since the dot actually represents any character except... newline, and it looks like you have newline... so here's the trick. there are flags that you can apply to the regex (See regex101 explanation) for example prefix your regex with (?i) and that tells Splunk that you want the regex to be case insensitive

In this case you'll use the /s flag (another way to represent it... ) so to have the .+ include newline (and represent all characters including newline you code it like this:

(?s)ERROR=\d+\s+\"(?P<myfield>.+)\"\s+END

which says:
Look at this as if everything is a single line
Walk past the following literal characters: ERROR=
Then walk past one or more digits, followed by a space and a literal double quote
Then create a field capturing group called "field"
Inside the field you put one or more characters
But don't include the next double quote, the one or more spaces that follow or, the literal word END
That last bit sort of anchors the field as before the combination of double quote, spaces and END. Sometimes you have to be more specific than that... (if there are other things in the event that look very close to the rest) but it's fine here if that's really what it looks like.

You can use that regex to extract a search time field (in the GUI, Settings> fields>extracted fields (and that will be placed into props.conf)
Or you can use it for a rex in your search:

...|rex "(?s)ERROR=\d+\s+\"(?P<myfield>.+)\"\s+END"|HEAD 1|table myfield

In your research you may have come across something like .* as well as .+
the .* means zero or more characters and if it finds some it's very greedy, meaning it'll just keep going sometimes.
the other means one or more, and it is perhaps less greedy... although still... greedy. 🙂
In this case, either is good... but you only use the * when you really need it. (or when you think you might have zero characters)

Re: Regex: I want to match a string and then extract the next lines until matching another string

mohan401 — Thu, 13 Jul 2017 05:59:40 GMT

For this sample log entry
dkf:fhj fjff jffj from IP 11.11.111.11. jdjd"\n