Hello Everyone,
I want to check if a field called "from_header_displayname" contains any Unicode.
Below is the event source, this example event contains the unicode of "\u0445":
"from_header_displayname": "'support@\u0445.comx.com'
And the following what I see from the web console, the unicode has been translated into "x" (note: it's not the real letter x, but something looks like x in the other language)
from_header_displayname: 'support@х.comx.com'
I used the following search but no luck:
index=email | regex from_header_displayname="[\u0000-\uffff]"
Error in 'SearchOperator:regex': The regex '[\u0000-\uffff]' is invalid. Regex: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u.
Please advise what should I use in this case.
Thanks in advance.
Regards,
Iris
To check if a field contains Unicode characters, you can use the regex command with a regular expression that matches non-ASCII characters, but if you're wanting to do filtering you might be better with something like match.
index=email | eval is_unicode = if(match(from_header_displayname, "[^\x00-\x7F]"), "true", "false") | where is_unicode="true"
This search uses the match function to check if the from_header_displayname field contains any characters outside the ASCII range (\x00-\x7F). If it does, the is_unicode field is set to "true".
Alternatively, you can directly filter the events using the where command with the match function.
index=email | where match(from_header_displayname, "[^\x00-\x7F]")
Here is another working example:
| makeresults
| eval from_header_displayname="support@\u0445.comx.com"
| eval from_header_displayname_unicode="support@х.comx.com"
| table from_header_displayname from_header_displayname_unicode
| eval unicode_detected_raw=if(match(from_header_displayname,"[^\x00-\x7F]"),"Yes","No")
| eval unicode_detected_unicode=if(match(from_header_displayname_unicode,"[^\x00-\x7F]"),"Yes","No")
| table from_header_displayname unicode_detected_raw from_header_displayname_unicode unicode_detected_unicode
Both of these approaches will help you identify events where the from_header_displayname field contains Unicode characters.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Thank you all for your reply! it helps!
To check if a field contains Unicode characters, you can use the regex command with a regular expression that matches non-ASCII characters, but if you're wanting to do filtering you might be better with something like match.
index=email | eval is_unicode = if(match(from_header_displayname, "[^\x00-\x7F]"), "true", "false") | where is_unicode="true"
This search uses the match function to check if the from_header_displayname field contains any characters outside the ASCII range (\x00-\x7F). If it does, the is_unicode field is set to "true".
Alternatively, you can directly filter the events using the where command with the match function.
index=email | where match(from_header_displayname, "[^\x00-\x7F]")
Here is another working example:
| makeresults
| eval from_header_displayname="support@\u0445.comx.com"
| eval from_header_displayname_unicode="support@х.comx.com"
| table from_header_displayname from_header_displayname_unicode
| eval unicode_detected_raw=if(match(from_header_displayname,"[^\x00-\x7F]"),"Yes","No")
| eval unicode_detected_unicode=if(match(from_header_displayname_unicode,"[^\x00-\x7F]"),"Yes","No")
| table from_header_displayname unicode_detected_raw from_header_displayname_unicode unicode_detected_unicode
Both of these approaches will help you identify events where the from_header_displayname field contains Unicode characters.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Unicode includes ASCII characters, so 0000-ffff would include all 16 bit characters. If you are looking for any 16 bit characters you could do either of these
| eval hasUncode=if(match(string, "[^[:ascii:]]"), "HAS-NON-ASCII", "ASCII")
| eval hasUncode=if(match(string, "[^\x00-\xff]"), "HAS-16 BIT CHARS", "8-BIT")
The first character class is ascii and is checking for any characters NOT in the ascii range (0x00-0x7f) and the second is checking for any non 8 bit characters.
So, this example which includes your lower case Cyrillic x demonstrates
| makeresults
| eval string=printf("{\"from_header_displayname\": \"'support@%c.comx.com'\"}", 1024+69)
| eval hasUncode1=if(match(string, "[^[:ascii:]]"), "HAS-NON-ASCII", "ASCII")
| eval hasUncode2=if(match(string, "[^\x00-\xff]"), "HAS-16-BIT", "8 BIT")