Hello, I have a case where I need to do regex and I built my regex using regex101, everything works great and catchs everything there
But I encountred an issue where splunk won't accept optional groups "(\\\")?", it'll give the error of unmatched closing parenthesis until you add another closing bracket like so: "(\\\"))?"
And another issue I encountred is after I add this closing bracket, the regex will work, but not consistently.
Here's what I mean:
That's a part of my regex:
\[\{(\\)?\"PhoneNumber(\\)?\":(\\)?\"(?<my_PhoneNumber>[^\\\"]+
Won't work until I add more brackets to the optional groups like I mentioned before:
\[\{(\\))?\"PhoneNumber(\\))?\":(\\))?\"(?<my_PhoneNumber>[^\\\"]+
second issue:
adding another part will still work:
\[\{(\\))?\"PhoneNumber(\\))?\":(\\))?\"(?<my_PhoneNumber>[^\\\"]+)\S+OtherPhoneNumber(\\))?\":(\\))?(\"))?(?<myother_PhoneNumber>[^,\\\"]+|null)
Adding a third part with the exact same format as the second part won't, will give the error of unmatched closing parenthesis again:
\[\{(\\))?\"PhoneNumber(\\))?\":(\\))?\"(?<my_PhoneNumber>[^\\\"]+)\S+OtherPhoneNumber(\\))?\":(\\))?(\"))?(?<myother_PhoneNumber>[^,\\\"]+|null)\S+Email(\\))?\":(\\))?(\"))?(?<email>[^,\\\"]+|null)
Am I missing something?
I know the regex itself works
Sample data of the original log:
[{"PhoneNumber":"+1 450555338","AlternativePhoneNumber":null,"Email":null,"VoiceOnlyPhoneNumber":null}]
[{\"PhoneNumber\":\"+20 425554005\",\"AlternativePhoneNumber\":\"+1 455255697\",\"Email\":\"Dam@test.com.us\",\"VoiceOnlyPhoneNumber\":null}]"}
[{\"PhoneNumber\":\"+1 459551561\",\"AlternativePhoneNumber\":\"+1 6155555533\",\"Email\":null,\"VoiceOnlyPhoneNumber\":\"+1 455556868\"}]
Would it be possible to post some sample data. It's a bit too easy to get lost in what is supposed to be an escape character versus a character in your data. Please replace any real phone numbers with dummy values.
Escaping backslashes for regex expressions is always fun, but I suspect that's where your issues are coming from. Escaping a backslash in a regex from the search box requires four backslashes as there are two layers of escaping that are happening.
I try to construct regexs to avoid that:
| makeresults | eval phone_data="[{\"PhoneNumber\":\"123-456-7890\"}]"
| append [ | makeresults | eval phone_data="[{\\\"PhoneNumber\\\":\\\"111-111-1111\\\"}]" ]
| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[0-9-\(\)]+)"
but if I'm making an incorrect assumption about the characters in aphone number, you can try
| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[^\\\\\"]+)"
Sample data of the original log:
[{"PhoneNumber":"+1 450555338","AlternativePhoneNumber":null,"Email":null,"VoiceOnlyPhoneNumber":null}]
[{\"PhoneNumber\":\"+20 425554005\",\"AlternativePhoneNumber\":\"+1 455255697\",\"Email\":\"Dam@test.com.us\",\"VoiceOnlyPhoneNumber\":null}]"}
[{\"PhoneNumber\":\"+1 459551561\",\"AlternativePhoneNumber\":\"+1 6155555533\",\"Email\":null,\"VoiceOnlyPhoneNumber\":\"+1 455556868\"}]
Do you mean to say that some log contains valid JSON, some contains quote-escaped JSON? Or was the first entry a misprint; all logs are in fact quote-escaped JSON, like the following?
log |
[{\"PhoneNumber\":\"+1 450555338\",\"AlternativePhoneNumber\":null,\"Email\":null,\"VoiceOnlyPhoneNumber\":null}] |
[{\"PhoneNumber\":\"+20 425554005\",\"AlternativePhoneNumber\":\"+1 455255697\",\"Email\":\"Dam@test.com.us\",\"VoiceOnlyPhoneNumber\":null}] |
[{\"PhoneNumber\":\"+1 459551561\",\"AlternativePhoneNumber\":\"+1 6155555533\",\"Email\":null,\"VoiceOnlyPhoneNumber\":\"+1 455556868\"}] |
In this illustration, I assume that the "original log" contains some additional elements; only one field (named log) contains those escaped JSON because it is very unreasonable to escape quotation marks if it is the complete log.
If as I speculated, all log values are escaped, you should aim at reconstructing JSON, not use rex to treat them as text. So, I recommend
| rex field=log mode=sed "s/\\\\\"/\"/g"
| spath input=log path={}
| mvexpand {}
| spath input={}
Using Splunk's built-in JSON handling is more robust than any regex you can craft. From the mock data, the above will give you
AlternativePhoneNumber | PhoneNumber | VoiceOnlyPhoneNumber | |
null | null | +1 450555338 | null |
+1 455255697 | Dam@test.com.us | +20 425554005 | null |
+1 6155555533 | null | +1 459551561 | +1 455556868 |
This is the emulation for the data
| makeresults
| eval log = mvappend("[{\\\"PhoneNumber\\\":\\\"+1 450555338\\\",\\\"AlternativePhoneNumber\\\":null,\\\"Email\\\":null,\\\"VoiceOnlyPhoneNumber\\\":null}]",
"[{\\\"PhoneNumber\\\":\\\"+20 425554005\\\",\\\"AlternativePhoneNumber\\\":\\\"+1 455255697\\\",\\\"Email\\\":\\\"Dam@test.com.us\\\",\\\"VoiceOnlyPhoneNumber\\\":null}]",
"[{\\\"PhoneNumber\\\":\\\"+1 459551561\\\",\\\"AlternativePhoneNumber\\\":\\\"+1 6155555533\\\",\\\"Email\\\":null,\\\"VoiceOnlyPhoneNumber\\\":\\\"+1 455556868\\\"}]")
| mvexpand log
``` data emulation above ```
Would it be possible to post some sample data. It's a bit too easy to get lost in what is supposed to be an escape character versus a character in your data. Please replace any real phone numbers with dummy values.
Escaping backslashes for regex expressions is always fun, but I suspect that's where your issues are coming from. Escaping a backslash in a regex from the search box requires four backslashes as there are two layers of escaping that are happening.
I try to construct regexs to avoid that:
| makeresults | eval phone_data="[{\"PhoneNumber\":\"123-456-7890\"}]"
| append [ | makeresults | eval phone_data="[{\\\"PhoneNumber\\\":\\\"111-111-1111\\\"}]" ]
| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[0-9-\(\)]+)"
but if I'm making an incorrect assumption about the characters in aphone number, you can try
| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[^\\\\\"]+)"
Thank you, the solution worked
I tried 4 backslashes and I noticed that you used 3, is there any important difference?
In my example, I use 3 backslashes when creating the sample data. To get \" in a quoted string, you need escape the backslash \\, and the quote \", resulting in \\\"
In the regex, I avoided the need to match on backslashes, so any backslash is just the escape character. However, in my alternative method, you'll notice that there are 5 backslashes in a row. The processing of the escape characters happens once for the string itself, taking \\\\\" down to \\", and then once for the regex, taking \\" down to \".
Thank you for your comment, I posted sample data in the original post and I will try your offer