Hi
I am trying to use Regex with the Field Extractor to extract the value of a particular field in a given piece of text, but am having a problem with the regex.
The text is in the format " text | message: value | more text ". So basically i need to extract the value of the field 'message' , and put it into a field named raw_message. The value of the message field can be any string.
Each field/value pair in the text is separated by a pipe character, as can be seen below. I want to just extract the value of the 'message' field. All other text can be ignored. The ":" character that proceeds the field name can be ignored also.
Sample text below:
| source: 10.2.2.134 | message: P-235332 | host: clmm0011.syn.local
So Regex needs to extract "P-235332" into a new field named raw_message.
Can somebody help me with a Regex that would work with this?
Thanks.
Yes, for that you could use the regex of . to grab any character, + tells it 1 or more matches, the ? makes it lazy so it doesn't attempt to grab everything to the end, then outside of the named capture group we show it the characters that appear after the field value we want, which in this case is a space, \s, and a pipe | character.
For the pipe | character, we have to escape it since it means something else in regex, so we put a backslash \ before it.
The end result is this regex, which should work for you:
message:\s*(?<raw_message>.+?)\s\|
Apologies, I should have mentioned that there is a possibility that the value can have space characters in it. So the regex you supplied only matches the text before a space appears.
Another sample text below. So in this example, the regex would need to capture "P-235332 55 clm". So would need to capture everything before the next pipe character.
| source: 10.2.2.134 | message: P-235332 55 clm | host: clmm0011.syn.local
Can you provide updated SPL for the above?
Yes, for that you could use the regex of . to grab any character, + tells it 1 or more matches, the ? makes it lazy so it doesn't attempt to grab everything to the end, then outside of the named capture group we show it the characters that appear after the field value we want, which in this case is a space, \s, and a pipe | character.
For the pipe | character, we have to escape it since it means something else in regex, so we put a backslash \ before it.
The end result is this regex, which should work for you:
message:\s*(?<raw_message>.+?)\s\|
You will need a named capture group.
If there is a space after the colon, or you're not sure, use this:
message:\s*(?<raw_message>\S+)
If there is always a space after the colon, you could just use this below.
The asterisk allow zero or more spaces (/s). You can learn more at regex101(dot)com or other sources.
message: (?<raw_message>\S+)
It's essential the same as the other person posted, but they were missing the ? for the named capture group and you don't really need anything before 'message' in this case.
A usable regex would be "| message: (<raw_message>\S+)".