Splunk Search

Regex to extract varying string

ahogbin
Communicator

Hello..

I am attempting to extract a string of varying format using regex. I have successfully extracted part of the string but am struggling to extract the string if it contains white space or a special character '-' for example

The text I am trying to extract always has a space before it and always ends with '['
DEV NS [
CI-DEV [
TST [

My regex so far is thus -rex "(?\w+) \["

But it is only extracting single blocks of text which is fine if there is only one block (in the case or TST) but if there are 2 blocks (eg DEV NS) or text with a hypen (CI-DEV) then it is not extracting the string.

Long story short... how do I modify the expression to include the whole string (space and hypen)

As always help is very much appreciated.

Cheers,

Alastair

Tags (1)
0 Karma
1 Solution

landen99
Motivator

This captures uppercase letters, numbers and dashes after an " O " when the capture group is followed by a space and an open bracket:
https://regex101.com/r/gD4eW7/3

| rex "O [^A-Z]*(?<myfield>[A-Z\-\s]+) \["

Added: If you want to match starting after " O " while ignoring only "nevo-web" and specifically, the most efficient regex is probably:

   | rex "O (nevo-web )?(?<myfield>[A-Z\-\s]+) \["

I used "O[^A-Z]*" in case there were other unanticipated lowercase words in front of your pattern of interest.

View solution in original post

landen99
Motivator

This captures uppercase letters, numbers and dashes after an " O " when the capture group is followed by a space and an open bracket:
https://regex101.com/r/gD4eW7/3

| rex "O [^A-Z]*(?<myfield>[A-Z\-\s]+) \["

Added: If you want to match starting after " O " while ignoring only "nevo-web" and specifically, the most efficient regex is probably:

   | rex "O (nevo-web )?(?<myfield>[A-Z\-\s]+) \["

I used "O[^A-Z]*" in case there were other unanticipated lowercase words in front of your pattern of interest.

ahogbin
Communicator

Hello... have tried the above 2 examples but neither give me what I am after and manage to exclude most of the entries I am after.
Why will my solution give me problems ? it is only dealing with a small set of data and returns everything I am after.

From my limited knowledge my query looks for a 'O' and then excludes the work nevo-web if it exists. It then returns everything else before the [ with the end result of spitting out the string I am after.

The problem with the examples to date is that they are missing the text after the first white space and before the second ( BLD NS) and are only returning NS

Open to better solutions and I do appreciate everyone's input

0 Karma

landen99
Motivator

MuS made a good catch by adding \s to capture multiple words in the pattern, including "BLD". I meant to do that originally, but I was only looking at two full events when I created the regex.

Addressing the problems question, in general, regex works best by matching patterns from left to right. Look-aheads, etc. are not that efficient and they require the pattern to exist or to not exist (less flexibility). Since this is Splunk, I assumed large datasets, and even small datasets can become large over time. Also, it is best to match as generally as possible in case the logs deviate from your test data.

0 Karma

ahogbin
Communicator

That makes sense.. thank you for taking the time to clarify.

Cheers.

Alastair

0 Karma

MuS
Legend

In addition this little modification will get all needed results:

.... | rex "O [^A-Z]*(?<myfield>[A-Z\-\s]+) \[" | ...
0 Karma

landen99
Motivator

Good catch. I agree.

0 Karma

ahogbin
Communicator

Perfect... just out of curiosity why is then any better than excluding a specific string as in [^"nevo-web"]

Thanks for all your help.

Alastair

0 Karma

ahogbin
Communicator

Got it... rex "(?[^\.O+[^"nevo-web"]+)\s\["seems to do the trick

Thanks for the help and suggestions

0 Karma

landen99
Motivator

The formatting for that regex did not come come through right, but if it is doing what it looks like, that approach will give you problems and will take much more time than it should to complete the task even if it works right. Check out my answer below..

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Why not capture everything between brackets...

 ... | rex "\]\s(?<myField>[^\]]+)\[" | ...
0 Karma

ahogbin
Communicator

because I am only after the specific text. I am gathering everything except for the string composed of 2 part (BLD NS)

0 Karma

MuS
Legend

Hi ahogbin,

based on your example and your regex try this:

... | rex "(?<myField>[^\s]+)\s\[" | ...

Hope this helps ...

cheers, MuS

0 Karma

ahogbin
Communicator

Looking good... however it is not picking up any string that has whitespace between the words (eg BLD NS - it is only including the NS component).

[7/03/16 12:23:27:936 AEDT] 0000005c SystemOut O BLD NS [WebContainer : 0]

Other than that is is working perfectly

Cheers

0 Karma

MuS
Legend

Can you provide all possible combinations please?

0 Karma

ahogbin
Communicator

There are 3 possible combinations
[7/03/16 12:42:24:999 AEDT] 0000005e SystemOut O BLD NS [WebContainer : 2]
[7/03/16 12:02:13:370 EST] 00000060 SystemOut O nevo-web CI-BLD [WebContainer : 4]
[7/03/16 11:58:06:564 EST] 00000092 SystemOut O TST [WebContainer : 2]

The extracted string is BLD NS or CI-BLD or TST

Yo example works perfectly for all but BLD NS

Thank you so much for your help

Cheers,

Alastair

0 Karma

ahogbin
Communicator

Have gotten a little closer

rex "(?\w{1,4}(?:\s|\-)\w{1,4}) \["

extracts the string I am after but for some reason some of the strings are extracted with an 'O' in front of them and other not

O TST

The log entry is

[7/03/16 11:32:49:079 EST] 000000b4 SystemOut O TST [WebContainer : 4]

The ones working correctly are

[7/03/16 11:32:49:101 EST] 00000060 SystemOut O nevo-web CI-TST [WebContainer : 0]

IE: the CI-TST string is extracted.

How do I stop the leading 'O' from being included in the string ?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...

Index This | How many sevens are there between 1 and 100?

August 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...