All Apps and Add-ons

TA-mailclient handling foreign characters

LIUJIEER
Explorer

This TA worked fine with emails in English. However, it does not work for foreign characters such as Chinese, Japanese. I received this debugging message as shown below and it stopped indexing once there is a email with foreign characters in the inbox, unless i annually remove it.

ERROR ExecProcessor - message from "python /apps/splunk/etc/apps/TA-mailclient/bin/mail.py" ERRORunknown encoding: iso-2022-jp

Is there any way i can index emails with foreign characters with this TA. Alternatively, can i just drop those emails with any foreign characters?

Thank you

Tags (2)

seunomosowon
Communicator

Please test 1.3.6dev. It should no longer remove the mails. I'll also check the headers.

Cheers,
Seun

0 Karma

alexstackharbor
New Member

Hi Seun,

I tested 1.3.6dev and 1.3.5dev quickly but I noticed that right away I started seeing the error 'Mail found with unexpected codec - '.

This is happening for mails that were previously being indexed properly also, so I have reverted to 1.3.5 for now which is working properly.

I would like to thank you again for all your efforts, these improvements make life easier for us as admins.

I was busy for a bit during the holidays but tomorrow we are back in the office and I can help you test future builds as they become ready.

Best Regards!

0 Karma

seunomosowon
Communicator

Thanks. I’ll try update it over the weekend again and test with a few samples.

0 Karma

seunomosowon
Communicator

Alright, I'll make some changes tomorrow and do some tests. I'll add a comment on here once I have a way to fix it.

0 Karma

seunomosowon
Communicator

Hi Again,

That encoding is not supported in the python included with Splunk. Splunk enterprise packages a small subset of the entire python libraries. Not sure how the missing libraries can be added to Splunk, but probably best to leave that for now.

For now, I edited the code to try to make it read it as ascii and escape characters greater than 127.

Please try version 1.3.5 of the app, and let me know how it goes. Still needs a bit of testing, but I've made it available for now, so you can try it out.

Cheers,
Seun

alexstackharbor
New Member

Hi Seun,

We will try it out too and let you know how it goes.

Thank you for your efforts.

0 Karma

LIUJIEER
Explorer

Hi Seun,

I just tried v1.3.5 with 2 test cases.

Test Case 1: Email with Chinese characters in the content.
Mail parsed through successfully

Test Case 2: Email with Chinese characters in the subject.
Mail failed to be parsed

Perhaps, you could modify the code to handle the first case as well.

Thank you for the help!

0 Karma

alexstackharbor
New Member

Hi seunomosowon,

We are having a very similar issue relating to the indexing of emails generated by WHM/cPanel servers, thank you for addressing this and we are looking forward to the update!

0 Karma

ddrillic
Ultra Champion

Interesting comment at ISO/IEC 2022

-- Although ISO/IEC 2022 character sets using control sequences are still in common use, particularly ISO-2022-JP, most modern e-mail applications are converting to use the simpler Unicode transforms such as UTF-8.

0 Karma

LIUJIEER
Explorer

I noticed that in props.conf:
CHARSET=AUTO

Why the above setting isn't enough to recognize foreign characters while parsing the mails in?

0 Karma

seunomosowon
Communicator

I'll update it in a week to catch the exception and ignore the types of mails for now. It looks more like it failed because your system or splunk's python doesn't have support for this particular encoding. converts others to UTF8.

Cheers

0 Karma

xmertens
New Member

Today, my Splunk did not process the configured mailbox for a few hours... After some investigations, I detected the same issue:

02-12-2018 17:40:13.739 +0100 ERROR ExecProcessor - message from "python /opt/splunk/etc/apps/TA-mailclient/bin/mail.py" ERRORunknown encoding: windows-874

The blocked email contained indeed a MIME part with this encoding...

/x

0 Karma

Anam
Community Manager
Community Manager

Hi xmertens

If none of the answers here worked for you, please post this as a new question. Since this question was posted ~2 months ago, you won't get as much exposure and better help in answering your problem.

Thanks

0 Karma

seunomosowon
Communicator

I’ll look into it next week, sure.
Thanks.

0 Karma

LIUJIEER
Explorer

Hi Seunomosowon,

Hi Seun,

I just tried v1.3.6dev with 2 test cases.

Test Case 1: Email with Chinese characters in the content.
Mail parsed through successfully

Test Case 2: Email with Chinese characters in the subject.
Mail failed to be parsed

Perhaps, you could modify the code to handle the first case as well.

Thank you for the help!

0 Karma

seunomosowon
Communicator

For now, it would discard it. Until I have some time to work on a different way to work on the encoding. Some codecs are just not supported by the limited python included with Splunk. You still got the headers.

You can use read only.

0 Karma

alexstackharbor
New Member

Hi Seun,
For our particular use case we are indexing thousands of emails daily that are generated by WHM servers so read only isn't really ideal as they would build up really fast.

We have some alerting in place for certain email events so the previous behavior of stalling indexing when it came across something it couldn't index was a much bigger problem for us as we then needed something to alert us that indexing had stalled, and then still required manual intervention to delete the email it had failed to index. Losing the odd email containing un-supported characters is a much smaller issue as we would generally just delete them anyways (it's usually just things like failed SSH logins with unsupported chars, or failed IMAP logins, etc that stops the processing for us)

That being said, maybe a great option would be to to either:
A, leave them in place but continue to process the others deleting only what was successfully indexed
or
B, and perhaps an even better option - Move them to a different IMAP folder ( for example, dump them in a folder named processing_failed ).

In any case, we are very happy already with the improvements you have made they have made a big difference for us so thank you very much for that.

On a side note, we had the option to include headers disabled on every account it's indexing but it was still including headers for most of the accounts, not sure why it would do that but I was able to resolve it by setting DEFAULT_INCLUDE_HEADERS = False in mail_constants.py

Funny thing is, it was working at first and I think when we upgraded to Splunk Enterprise 7.0.1 it started including headers ( It could be a coincidence I am not certain of this being related but it was definitely around the same time )

Anyways, thanks again for your efforts I am sure there are others out there who really appreciate it too!

Regards.

0 Karma

seunomosowon
Communicator

Thanks. I'll push out an update in a couple of minutes. It would leave mails that are encoded in an unsupported codec untouched.

0 Karma

LIUJIEER
Explorer

Thank you and appreciate your help!

0 Karma

seunomosowon
Communicator

Hey Alex,

I'm about to upload 1.3.7dev. Please try it out and let me know how it goes. I actually try sending some kanji, but my system sends it out using UTF8, so it works.
I'll find a way to test "ISO-2022-JP" at some point.

Thanks.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...