Re: TA-mailclient handling foreign characters

LIUJIEER · ‎12-25-2017

This TA worked fine with emails in English. However, it does not work for foreign characters such as Chinese, Japanese. I received this debugging message as shown below and it stopped indexing once there is a email with foreign characters in the inbox, unless i annually remove it.

ERROR ExecProcessor - message from "python /apps/splunk/etc/apps/TA-mailclient/bin/mail.py" ERRORunknown encoding: iso-2022-jp

Is there any way i can index emails with foreign characters with this TA. Alternatively, can i just drop those emails with any foreign characters?

Thank you

seunomosowon · ‎12-29-2017

Please test 1.3.6dev. It should no longer remove the mails. I'll also check the headers.

Cheers,
Seun

alexstackharbor · ‎01-02-2018

Hi Seun,

I tested 1.3.6dev and 1.3.5dev quickly but I noticed that right away I started seeing the error 'Mail found with unexpected codec - '.

This is happening for mails that were previously being indexed properly also, so I have reverted to 1.3.5 for now which is working properly.

I would like to thank you again for all your efforts, these improvements make life easier for us as admins.

I was busy for a bit during the holidays but tomorrow we are back in the office and I can help you test future builds as they become ready.

Best Regards!

seunomosowon · ‎01-02-2018

Thanks. I’ll try update it over the weekend again and test with a few samples.

seunomosowon · ‎12-28-2017

Alright, I'll make some changes tomorrow and do some tests. I'll add a comment on here once I have a way to fix it.

seunomosowon · ‎12-28-2017

Hi Again,

That encoding is not supported in the python included with Splunk. Splunk enterprise packages a small subset of the entire python libraries. Not sure how the missing libraries can be added to Splunk, but probably best to leave that for now.

For now, I edited the code to try to make it read it as ascii and escape characters greater than 127.

Please try version 1.3.5 of the app, and let me know how it goes. Still needs a bit of testing, but I've made it available for now, so you can try it out.

Cheers,
Seun

alexstackharbor · ‎12-28-2017

Hi Seun,

We will try it out too and let you know how it goes.

Thank you for your efforts.

LIUJIEER · ‎12-28-2017

Hi Seun,

I just tried v1.3.5 with 2 test cases.

Test Case 1: Email with Chinese characters in the content.
Mail parsed through successfully

Test Case 2: Email with Chinese characters in the subject.
Mail failed to be parsed

Perhaps, you could modify the code to handle the first case as well.

Thank you for the help!

alexstackharbor · ‎12-28-2017

Hi seunomosowon,

We are having a very similar issue relating to the indexing of emails generated by WHM/cPanel servers, thank you for addressing this and we are looking forward to the update!

ddrillic · ‎12-26-2017

Interesting comment at ISO/IEC 2022

-- Although ISO/IEC 2022 character sets using control sequences are still in common use, particularly ISO-2022-JP, most modern e-mail applications are converting to use the simpler Unicode transforms such as UTF-8.

LIUJIEER · ‎12-26-2017

I noticed that in props.conf:
CHARSET=AUTO

Why the above setting isn't enough to recognize foreign characters while parsing the mails in?

seunomosowon · ‎12-26-2017

I'll update it in a week to catch the exception and ignore the types of mails for now. It looks more like it failed because your system or splunk's python doesn't have support for this particular encoding. converts others to UTF8.

Cheers

xmertens · ‎02-12-2018

Today, my Splunk did not process the configured mailbox for a few hours... After some investigations, I detected the same issue:

02-12-2018 17:40:13.739 +0100 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/TA-mailclient/bin/mail.py" ERRORunknown encoding: windows-874

The blocked email contained indeed a MIME part with this encoding...

/x

Anam · ‎02-12-2018

Hi xmertens

If none of the answers here worked for you, please post this as a new question. Since this question was posted ~2 months ago, you won't get as much exposure and better help in answering your problem.

Thanks

seunomosowon · ‎12-29-2017

I’ll look into it next week, sure.
Thanks.

LIUJIEER · ‎01-01-2018

Hi Seunomosowon,

Hi Seun,

I just tried v1.3.6dev with 2 test cases.

Test Case 1: Email with Chinese characters in the content.
Mail parsed through successfully

Test Case 2: Email with Chinese characters in the subject.
Mail failed to be parsed

Perhaps, you could modify the code to handle the first case as well.

Thank you for the help!

seunomosowon · ‎12-29-2017

For now, it would discard it. Until I have some time to work on a different way to work on the encoding. Some codecs are just not supported by the limited python included with Splunk. You still got the headers.

You can use read only.

alexstackharbor · ‎12-29-2017

Hi Seun,
For our particular use case we are indexing thousands of emails daily that are generated by WHM servers so read only isn't really ideal as they would build up really fast.

We have some alerting in place for certain email events so the previous behavior of stalling indexing when it came across something it couldn't index was a much bigger problem for us as we then needed something to alert us that indexing had stalled, and then still required manual intervention to delete the email it had failed to index. Losing the odd email containing un-supported characters is a much smaller issue as we would generally just delete them anyways (it's usually just things like failed SSH logins with unsupported chars, or failed IMAP logins, etc that stops the processing for us)

That being said, maybe a great option would be to to either:
A, leave them in place but continue to process the others deleting only what was successfully indexed
or
B, and perhaps an even better option - Move them to a different IMAP folder ( for example, dump them in a folder named processing_failed ).

In any case, we are very happy already with the improvements you have made they have made a big difference for us so thank you very much for that.

On a side note, we had the option to include headers disabled on every account it's indexing but it was still including headers for most of the accounts, not sure why it would do that but I was able to resolve it by setting DEFAULT_INCLUDE_HEADERS = False in mail_constants.py

Funny thing is, it was working at first and I think when we upgraded to Splunk Enterprise 7.0.1 it started including headers ( It could be a coincidence I am not certain of this being related but it was definitely around the same time )

Anyways, thanks again for your efforts I am sure there are others out there who really appreciate it too!

Regards.

seunomosowon · ‎12-29-2017

Thanks. I'll push out an update in a couple of minutes. It would leave mails that are encoded in an unsupported codec untouched.

LIUJIEER · ‎12-26-2017

Thank you and appreciate your help!

seunomosowon · ‎01-06-2018

Hey Alex,

I'm about to upload 1.3.7dev. Please try it out and let me know how it goes. I actually try sending some kanji, but my system sends it out using UTF8, so it works.
I'll find a way to test "ISO-2022-JP" at some point.

Thanks.

TA-mailclient handling foreign characters

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation

TA-mailclient handling foreign characters

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future