Splunk Search

how to train Splunk to recognize a character set

alextsui
Path Finder

Hello. My logs contain Simple Chinese characters. After setting CHARSET = GB2312 in the props.conf, some Chinese characters showed up correctly and some didn't. GB2312 encoding is a bit old. GB13000 is the current standard, and it recognizes more characters then GB2312 does. I figure if I can train Splunk to use GB13000 instead of GB2312, it may solve my problem. In the admin manual (http://www.splunk.com/base/Documentation/latest/Admin/Configurecharactersetencoding) it mentions that a sample character set specification file can be added to $SPLUNK_HOME/etc/ngram-models/ to train Splunk to recognize the character set. How do I create such file? Where can I find more information on this topic?

Thanks.

Tags (1)
0 Karma
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

Adding samples to ngram-models simply assists Splunk in guessing a CHARSET that we already support. It cannot be used to add support for a new charset. We have in product support for GB18030, GB231280 and GBK in addition to GB2312.

View solution in original post

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

Adding samples to ngram-models simply assists Splunk in guessing a CHARSET that we already support. It cannot be used to add support for a new charset. We have in product support for GB18030, GB231280 and GBK in addition to GB2312.

0 Karma

alextsui
Path Finder

Thank you, Stephen.
I changed the props.conf to CHARSET=GB18030, and the problem was solved.

0 Karma
Get Updates on the Splunk Community!

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...

Security Highlights | January 2023 Newsletter

January 2023 Splunk Security Essentials (SSE) 3.7.0 ReleaseThe free Splunk Security Essentials (SSE) 3.7.0 app ...

Platform Highlights | January 2023 Newsletter

 January 2023Peace on Earth and Peace of Mind With Business ResilienceAll organizations can start the new year ...