Splunk Dev

Detecting different character sets in an email subject

sheamus69
Communicator

I want to report on emails containing subjects that are using difference character sets, such as Chinese, Russian, Greek alphabet, etc.

Is there an easy way to pull out the character encoding from the emails?

eg:

Sender: someone@somewhere.com
Sender: 你好,世界
Tags (1)
0 Karma
1 Solution

dkeck
Influencer

I found a List of all languages here :

https://www.regular-expressions.info/unicode.html#prop scroll down to Unicode Scripts and Unicode Blocks.

You could use [^\p{Latin}], since everything you are looking for is non latin?!

I think thats the closest you can get, by using the rex above

View solution in original post

dkeck
Influencer

I found a List of all languages here :

https://www.regular-expressions.info/unicode.html#prop scroll down to Unicode Scripts and Unicode Blocks.

You could use [^\p{Latin}], since everything you are looking for is non latin?!

I think thats the closest you can get, by using the rex above

dkeck
Influencer

Would be great if you could accept and upvote the question, thank you 🙂

0 Karma

dkeck
Influencer

HI,

its a bit bulky, but wouldn´t it be working if you use your regex to find everything except chracters you have in your charset?

like

[^:.@,\s+\w+]

This is matching the chinese characters

OR

And at least for chinese there is a method to match all chracters with \p{Han}. Seems to work in splunk.

| makeresults | eval aaa="世界" | rex field=aaa (?<my_aaa>\p{Han}.*)
0 Karma

sheamus69
Communicator

That was an interesting approach I hadn't considered. The problem I am finding is that there seem to be lots of edge cases, such as é, í, !, ", £, $, %, ... I keep going back and finding more to exclude.

Not sure if this is the best way to do it, or if theres an alternate approach?

0 Karma
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...