Splunk Search

Search for non ascii characters

bhupalbobbadi
Path Finder

Is there any way to search for events which has any special characters? thanks in advance for any help.

Labels (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Do you mean symbols scammers use to spoof Latin letters and letters in other writing systems?  This question came up earlier, too.  No, those cannot be characterized as non-ASCII characters in everyday sense.  Those are UTF-8 representation of letters of a totally different language that look like letters in a familiar language such as English.  There is no built-in function in any programming language that can map those spoofs into English or map English letters into all variants of spoofs.

Your solution, if you want to build one, is to create a lookup table with, say, English letters as column "English", French letters as column "French", then populate a lookup key with all possible letters that can spoof each of these letters, including "English" A ← spoof A (real English A).  Then, use Splunk lookup to figure our if any UTF-symbol that appears in a text matches a meaningful letter in a given language.

  1. It will be a real project to produce this table, although I fancy there is Internet help from Dr. Google.
  2. It will be computationally expensive because you need to test lots of hypotheses like a real fraud analyst.

Hope this helps. (For real fraud analyst, it is probably cheaper to run the image of the "text" through an OCR and see what they resemble.)

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Splunk ingests all input as UTF8 characters - which is a superset of ASCII.  So, the boiler-plate answer is Yes, there is a way.  Is there something you know that is in your events but you are having difficulty with?  Are you concerned about ASCII-7, ASCII-8, or something totally different?  To get specific help, you need to illustrate your input and illustrate desired output.

0 Karma

bhupalbobbadi
Path Finder

For example, I have some events with a field has the following content and other events has just palin ascii content. I need to search the events which has any special (non ascii) characters in the event field.

 

JP-Morgan - Υ‌ο‌ս‌r‌ С‌а‌r‌ⅾ‌ h‌а‌ѕ‌ b‌е‌е‌ո‌ ս‌ѕ‌е‌ⅾ‌ t‌ο‌ с‌h‌а‌r‌ց‌е‌ $24‌2‌.4‌5‌ а‌t‌ TARGET,MA ⅾ‌а‌t‌е‌ⅾ‌ 04/29/2‌0‌2‌4. І‌f‌ у‌о‌ս‌ ⅾ‌і‌ⅾ‌ո‌'t‌ а‌t‌t‌е‌ⅿ‌р‌t‌ t‌h‌і‌ѕ‌ t‌r‌а‌ո‌ѕ‌а‌с‌t‌і‌ο‌ո‌, ⅴ‌і‌ѕ‌і‌t‌ https://cutt.ly/Zeqkaq14 t‌ο‌ с‌а‌ո‌с‌е‌І‌ ո‌о‌ԝ‌.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Do you mean symbols scammers use to spoof Latin letters and letters in other writing systems?  This question came up earlier, too.  No, those cannot be characterized as non-ASCII characters in everyday sense.  Those are UTF-8 representation of letters of a totally different language that look like letters in a familiar language such as English.  There is no built-in function in any programming language that can map those spoofs into English or map English letters into all variants of spoofs.

Your solution, if you want to build one, is to create a lookup table with, say, English letters as column "English", French letters as column "French", then populate a lookup key with all possible letters that can spoof each of these letters, including "English" A ← spoof A (real English A).  Then, use Splunk lookup to figure our if any UTF-symbol that appears in a text matches a meaningful letter in a given language.

  1. It will be a real project to produce this table, although I fancy there is Internet help from Dr. Google.
  2. It will be computationally expensive because you need to test lots of hypotheses like a real fraud analyst.

Hope this helps. (For real fraud analyst, it is probably cheaper to run the image of the "text" through an OCR and see what they resemble.)

0 Karma

bhupalbobbadi
Path Finder

Yes, that is correct, I'm looking for splunk search using rex or any other built in function which will select the event if that has any of those spoofing English letters. You analysis for the problem is great. thank you for the analysis and suggestion here.

0 Karma
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...