Splunk Search

How to extract a field value using Regex with Field Extractor

ezmo1982
Path Finder

Hi

I am trying to use Regex with the Field Extractor to extract the value of a particular field in a given piece of text, but am having a problem with the regex.

The text is in the format " text | message: value | more text ". So basically i need to extract the value of the field 'message' , and put it into a field named raw_message. The value of the message field can be any string.

Each field/value pair in the text is separated by a pipe character, as can be seen below. I want to just extract the value of the 'message' field. All other text can be ignored. The ":" character that proceeds the field name can be ignored also.

Sample text below:

 

 

 

 

| source: 10.2.2.134 | message: P-235332 | host: clmm0011.syn.local

 

 

 

 

So Regex needs to extract "P-235332" into a new field named raw_message.

Can somebody help me with a Regex that would work with this?

Thanks.

Labels (3)
0 Karma
1 Solution

moliminous
Path Finder

Yes, for that you could use the regex of . to grab any character, + tells it 1 or more matches, the ? makes it lazy so it doesn't attempt to grab everything to the end, then outside of the named capture group we show it the characters that appear after the field value we want, which in this case is a space, \s, and a pipe | character.

For the pipe | character, we have to escape it since it means something else in regex, so we put a backslash \ before it.

The end result is this regex, which should work for you:

message:\s*(?<raw_message>.+?)\s\|

View solution in original post

ezmo1982
Path Finder

Apologies, I should have mentioned that there is a possibility that the value can have space characters in it. So the regex you supplied only matches the text before a space appears.

Another sample text below. So in this example, the regex would need to capture "P-235332 55 clm". So would need to capture everything before the next pipe character.

| source: 10.2.2.134 | message: P-235332 55 clm | host: clmm0011.syn.local

Can you provide updated SPL for the above?

0 Karma

moliminous
Path Finder

Yes, for that you could use the regex of . to grab any character, + tells it 1 or more matches, the ? makes it lazy so it doesn't attempt to grab everything to the end, then outside of the named capture group we show it the characters that appear after the field value we want, which in this case is a space, \s, and a pipe | character.

For the pipe | character, we have to escape it since it means something else in regex, so we put a backslash \ before it.

The end result is this regex, which should work for you:

message:\s*(?<raw_message>.+?)\s\|

moliminous
Path Finder

You will need a named capture group.

If there is a space after the colon, or you're not sure, use this:

message:\s*(?<raw_message>\S+)

If there is always a space after the colon,  you could just use this below.
The asterisk allow zero or more spaces (/s). You can learn more at regex101(dot)com or other sources.

message: (?<raw_message>\S+)

It's essential the same as the other person posted, but they were missing the ? for the named capture group and you don't really need anything before 'message' in this case. 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

A usable regex would be "| message: (<raw_message>\S+)".

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...