Splunk Search

how to train Splunk to recognize a character set

alextsui
Path Finder

Hello. My logs contain Simple Chinese characters. After setting CHARSET = GB2312 in the props.conf, some Chinese characters showed up correctly and some didn't. GB2312 encoding is a bit old. GB13000 is the current standard, and it recognizes more characters then GB2312 does. I figure if I can train Splunk to use GB13000 instead of GB2312, it may solve my problem. In the admin manual (http://www.splunk.com/base/Documentation/latest/Admin/Configurecharactersetencoding) it mentions that a sample character set specification file can be added to $SPLUNK_HOME/etc/ngram-models/ to train Splunk to recognize the character set. How do I create such file? Where can I find more information on this topic?

Thanks.

Tags (1)
0 Karma
1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

Adding samples to ngram-models simply assists Splunk in guessing a CHARSET that we already support. It cannot be used to add support for a new charset. We have in product support for GB18030, GB231280 and GBK in addition to GB2312.

View solution in original post

0 Karma

Stephen_Sorkin
Splunk Employee
Splunk Employee

Adding samples to ngram-models simply assists Splunk in guessing a CHARSET that we already support. It cannot be used to add support for a new charset. We have in product support for GB18030, GB231280 and GBK in addition to GB2312.

0 Karma

alextsui
Path Finder

Thank you, Stephen.
I changed the props.conf to CHARSET=GB18030, and the problem was solved.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...