Do you mean symbols scammers use to spoof Latin letters and letters in other writing systems? This question came up earlier, too. No, those cannot be characterized as non-ASCII characters in everyd...
See more...
Do you mean symbols scammers use to spoof Latin letters and letters in other writing systems? This question came up earlier, too. No, those cannot be characterized as non-ASCII characters in everyday sense. Those are UTF-8 representation of letters of a totally different language that look like letters in a familiar language such as English. There is no built-in function in any programming language that can map those spoofs into English or map English letters into all variants of spoofs. Your solution, if you want to build one, is to create a lookup table with, say, English letters as column "English", French letters as column "French", then populate a lookup key with all possible letters that can spoof each of these letters, including "English" A ← spoof A (real English A). Then, use Splunk lookup to figure our if any UTF-symbol that appears in a text matches a meaningful letter in a given language. It will be a real project to produce this table, although I fancy there is Internet help from Dr. Google. It will be computationally expensive because you need to test lots of hypotheses like a real fraud analyst. Hope this helps. (For real fraud analyst, it is probably cheaper to run the image of the "text" through an OCR and see what they resemble.)