Hello,
I am terrible at Regex and am in need of help on rexing a field from another field.
So an event snippet is:
"InterfaceDocument" : "{\"MessageType\":\"ANY_ITM_ImportItems\",\"OrgId\":\"8103_DEV\",\"MessageId\":\"15834815630667925_PUBSUB\",\"ErrorCode\":\"fwe::10001\"
I need to create a field for everything between
{\"MessageType\":\"
and
\"OrgId\
In my search I am using the following:
rex field=InterfaceDocument "MessageType\":\"(?<Type>[^*])"
This only creates a field with the first letter "A".
Could anyone help me out on how I can get this to work right. My apologies, again, I am not regex savvy at all. 🙂
Thanks,
Tom
You can use below,
| rex field=InterfaceDocument "\"MessageType\":\"(?<Type>[^\"]+)"
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!
Ugh. This seems like some kind of a string containing a JSON object as a field in another JSON object.
Why not use spath then?
| spath input=InterfaceDocument "MessageType"
Exactly. If the raw event contains both non-JSON and JSON components, extract that JSON component first, then use spath to extract keys from JSON.
Getting the number of backslashes right in regex for rex commands can be tricky (as @richgalloway has already alluded to), especially as they are different to what is required by regex101.com! Try something like this
| makeresults
| fields - _time
| eval InterfaceDocument="\"InterfaceDocument\" : \"{\\\"MessageType\\\":\\\"ANY_ITM_ImportItems\\\",\\\"OrgId\\\":\\\"8103_DEV\\\",\\\"MessageId\\\":\\\"15834815630667925_PUBSUB\\\",\\\"ErrorCode\\\":\\\"fwe::10001\\\""
| rex field=InterfaceDocument "MessageType\\\\\":\\\\\"(?<Type>[^\\\\]+)"
Having said that, these look like JSON fields, why don't you try using the json_ functions?
Hey @tdavison76,
You can use the following regex in your rex command.
| rex field=InterfaceDocument "MessageType\\"\:\\\"(?<message_type>[\w\_\d]+)\\\",\\\"OrgId\\\"\:\\\"(?<org_id>[\w\d\_]+)\\\""
You can play around regex here - https://regex101.com/r/auuJ8u/1
For now, I've only captured alphabets, numbers, and underscores (_). If there are any other special characters that are part of either MessageType or OrgId, feel free to add them within the square brackets in the capturing group.
Thanks,
Tejas.
---
If the above solution helps, an upvote is appreciated..!!
@tdavison76 try something like this?
| rex field=InterfaceDocument "MessageType\":\"(?<MessageType>[^\"]*)"
If this Helps, Please Upvote.
By default, a character class (anything inside square brackets) will match a single character. That's why you get only "A". Use a quantifier ('+' or '*') to match multiple characters.
The current regex matches anything that is not an asterisk, which would be everything in the sample data. I expect this is not the intent. Try this regular expression, which matches up to the first backslash in the MessageType field.
MessageType\\":\\"(?<Type>[^\\]+)
Note: You probably will need to escape the embedded quotation marks in the expression to get it to work in SPL. Because of the multiple layers of processing, multiple escapes are required - something like this
MessageType\\\\\":\\\\\"(?<Type>[^\\]+)