Splunk Search

Splunk regex bug/issue

Josh1890
Explorer

Hello, I have a case where I need to do regex  and I built my regex using regex101, everything works great and catchs everything there

But I encountred an issue where splunk won't accept optional groups "(\\\")?", it'll give the error of unmatched closing parenthesis until you add another closing bracket like so: "(\\\"))?"

And another issue I encountred is after I add this closing bracket, the regex will work, but not consistently.

Here's what I mean:

That's a part of my regex:

 

 

\[\{(\\)?\"PhoneNumber(\\)?\":(\\)?\"(?<my_PhoneNumber>[^\\\"]+

 

 

Won't work until I add more brackets to the optional groups like I mentioned before:

 

 

\[\{(\\))?\"PhoneNumber(\\))?\":(\\))?\"(?<my_PhoneNumber>[^\\\"]+

 

 

 

second issue:

adding another part will still work:

 

 

\[\{(\\))?\"PhoneNumber(\\))?\":(\\))?\"(?<my_PhoneNumber>[^\\\"]+)\S+OtherPhoneNumber(\\))?\":(\\))?(\"))?(?<myother_PhoneNumber>[^,\\\"]+|null)

 

 

 

Adding a third part with the exact same format as the second part won't, will give the error of unmatched closing parenthesis again:

 

 

\[\{(\\))?\"PhoneNumber(\\))?\":(\\))?\"(?<my_PhoneNumber>[^\\\"]+)\S+OtherPhoneNumber(\\))?\":(\\))?(\"))?(?<myother_PhoneNumber>[^,\\\"]+|null)\S+Email(\\))?\":(\\))?(\"))?(?<email>[^,\\\"]+|null)

 

 

 

Am I missing something?
I know the regex itself works

 

Sample data of the original log:

 

[{"PhoneNumber":"+1 450555338","AlternativePhoneNumber":null,"Email":null,"VoiceOnlyPhoneNumber":null}]

 

[{\"PhoneNumber\":\"+20 425554005\",\"AlternativePhoneNumber\":\"+1 455255697\",\"Email\":\"Dam@test.com.us\",\"VoiceOnlyPhoneNumber\":null}]"}

 

[{\"PhoneNumber\":\"+1 459551561\",\"AlternativePhoneNumber\":\"+1 6155555533\",\"Email\":null,\"VoiceOnlyPhoneNumber\":\"+1 455556868\"}]

Labels (1)
Tags (1)
0 Karma
1 Solution

P_vandereerden
Splunk Employee
Splunk Employee

Would it be possible to post some sample data. It's a bit too easy to get lost in what is supposed to be an escape character versus a character in your data. Please replace any real phone numbers with dummy values. 

Escaping backslashes for regex expressions is always fun, but I suspect that's where your issues are coming from. Escaping a backslash in a regex from the search box requires four backslashes as there are two layers of escaping that are happening. 

I try to construct regexs to avoid that:

| makeresults | eval phone_data="[{\"PhoneNumber\":\"123-456-7890\"}]"
| append [ | makeresults | eval phone_data="[{\\\"PhoneNumber\\\":\\\"111-111-1111\\\"}]" ]
| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[0-9-\(\)]+)"

but if I'm making an incorrect assumption about the characters in aphone number, you can try

| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[^\\\\\"]+)"

 

Paul van der Eerden,
Breaking software for over 20 years.

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Sample data of the original log:

[{"PhoneNumber":"+1 450555338","AlternativePhoneNumber":null,"Email":null,"VoiceOnlyPhoneNumber":null}]

[{\"PhoneNumber\":\"+20 425554005\",\"AlternativePhoneNumber\":\"+1 455255697\",\"Email\":\"Dam@test.com.us\",\"VoiceOnlyPhoneNumber\":null}]"}

[{\"PhoneNumber\":\"+1 459551561\",\"AlternativePhoneNumber\":\"+1 6155555533\",\"Email\":null,\"VoiceOnlyPhoneNumber\":\"+1 455556868\"}]


Do you mean to say that some log contains valid JSON, some contains quote-escaped JSON?  Or was the first entry a misprint; all logs are in fact quote-escaped JSON, like the following?

log
[{\"PhoneNumber\":\"+1 450555338\",\"AlternativePhoneNumber\":null,\"Email\":null,\"VoiceOnlyPhoneNumber\":null}]
[{\"PhoneNumber\":\"+20 425554005\",\"AlternativePhoneNumber\":\"+1 455255697\",\"Email\":\"Dam@test.com.us\",\"VoiceOnlyPhoneNumber\":null}]
[{\"PhoneNumber\":\"+1 459551561\",\"AlternativePhoneNumber\":\"+1 6155555533\",\"Email\":null,\"VoiceOnlyPhoneNumber\":\"+1 455556868\"}]

In this illustration, I assume that the "original log" contains some additional elements; only one field (named log) contains those escaped JSON because it is very unreasonable to escape quotation marks if it is the complete log.

If as I speculated, all log values are escaped, you should aim at reconstructing JSON, not use rex to treat them as text.  So, I recommend

 

| rex field=log mode=sed "s/\\\\\"/\"/g"
| spath input=log path={}
| mvexpand {}
| spath input={}

 

Using Splunk's built-in JSON handling is more robust than any regex you can craft.  From the mock data, the above will give you

AlternativePhoneNumberEmailPhoneNumberVoiceOnlyPhoneNumber
nullnull+1 450555338null
+1 455255697Dam@test.com.us+20 425554005null
+1 6155555533null+1 459551561+1 455556868

This is the emulation for the data

 

| makeresults
| eval log = mvappend("[{\\\"PhoneNumber\\\":\\\"+1 450555338\\\",\\\"AlternativePhoneNumber\\\":null,\\\"Email\\\":null,\\\"VoiceOnlyPhoneNumber\\\":null}]",

"[{\\\"PhoneNumber\\\":\\\"+20 425554005\\\",\\\"AlternativePhoneNumber\\\":\\\"+1 455255697\\\",\\\"Email\\\":\\\"Dam@test.com.us\\\",\\\"VoiceOnlyPhoneNumber\\\":null}]",

"[{\\\"PhoneNumber\\\":\\\"+1 459551561\\\",\\\"AlternativePhoneNumber\\\":\\\"+1 6155555533\\\",\\\"Email\\\":null,\\\"VoiceOnlyPhoneNumber\\\":\\\"+1 455556868\\\"}]")
| mvexpand log
``` data emulation above ```

 

0 Karma

P_vandereerden
Splunk Employee
Splunk Employee

Would it be possible to post some sample data. It's a bit too easy to get lost in what is supposed to be an escape character versus a character in your data. Please replace any real phone numbers with dummy values. 

Escaping backslashes for regex expressions is always fun, but I suspect that's where your issues are coming from. Escaping a backslash in a regex from the search box requires four backslashes as there are two layers of escaping that are happening. 

I try to construct regexs to avoid that:

| makeresults | eval phone_data="[{\"PhoneNumber\":\"123-456-7890\"}]"
| append [ | makeresults | eval phone_data="[{\\\"PhoneNumber\\\":\\\"111-111-1111\\\"}]" ]
| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[0-9-\(\)]+)"

but if I'm making an incorrect assumption about the characters in aphone number, you can try

| rex field=phone_data "PhoneNumber[^\d]+(?<my_PhoneNumber>[^\\\\\"]+)"

 

Paul van der Eerden,
Breaking software for over 20 years.
0 Karma

Josh1890
Explorer

Thank you, the solution worked
I tried 4 backslashes and I noticed that you used 3, is there any important difference?

P_vandereerden
Splunk Employee
Splunk Employee

In my example, I use 3 backslashes when creating the sample data. To get \" in a quoted string, you need escape the backslash \\, and the quote \", resulting in \\\"

In the regex, I avoided the need to match on backslashes, so any backslash is just the escape character. However, in my alternative method, you'll notice that there are 5 backslashes in a row. The processing of the escape characters happens once for the string itself, taking \\\\\" down to \\", and then once for the regex, taking \\" down to \". 

Paul van der Eerden,
Breaking software for over 20 years.
0 Karma

Josh1890
Explorer

Thank you for your comment, I posted sample data in the original post and I will try your offer

0 Karma
Get Updates on the Splunk Community!

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...

3 Ways to Make OpenTelemetry Even Better

My role as an Observability Specialist at Splunk provides me with the opportunity to work with customers of ...

What's New in Splunk Cloud Platform 9.2.2406?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2406 with many ...