Splunk Search

how to train Splunk to recognize a character set

alextsui
Path Finder

Hello. My logs contain Simple Chinese characters. After setting CHARSET = GB2312 in the props.conf, some Chinese characters showed up correctly and some didn't. GB2312 encoding is a bit old. GB13000 is the current standard, and it recognizes more characters then GB2312 does. I figure if I can train Splunk to use GB13000 instead of GB2312, it may solve my problem. In the admin manual (http://www.splunk.com/base/Documentation/latest/Admin/Configurecharactersetencoding) it mentions that a sample character set specification file can be added to $SPLUNK_HOME/etc/ngram-models/ to train Splunk to recognize the character set. How do I create such file? Where can I find more information on this topic?

Thanks.

Tags (1)
0 Karma
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

Adding samples to ngram-models simply assists Splunk in guessing a CHARSET that we already support. It cannot be used to add support for a new charset. We have in product support for GB18030, GB231280 and GBK in addition to GB2312.

View solution in original post

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

Adding samples to ngram-models simply assists Splunk in guessing a CHARSET that we already support. It cannot be used to add support for a new charset. We have in product support for GB18030, GB231280 and GBK in addition to GB2312.

0 Karma

alextsui
Path Finder

Thank you, Stephen.
I changed the props.conf to CHARSET=GB18030, and the problem was solved.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...