Splunk Search

Lookup files with foreign characters

gpburgett
Splunk Employee
Splunk Employee

I am setting up an app for a financial customer in Korea. They are using a standardized business reporting language that is all in English. I've indexed it and extracted all the necessary fields, but the customer needs the terms of some fields to be translated into Korean. I've created a csv file with all of the English terms and their Korean equivalents and put it under system>lookups, but when I restart Splunk and run the search on that sourcetype the following message comes up:

[EventsViewer module] Input is not proper UTF-8, indicate encoding ! Bytes: 0xB1 0xB8 0xBA 0xD0, line 59, column 8

I tried editing the charset in props.conf, and I've changed the format of the file to UTF-8 and even tried using the Korean character set that splunk supports, but I still get the same message. Does Splunk not support foreign character lookups? or am I missing something in my configurations?

Tags (2)
1 Solution

mitch_1
Splunk Employee
Splunk Employee

The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.

Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.

View solution in original post

mitch_1
Splunk Employee
Splunk Employee

The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.

Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.

gpburgett
Splunk Employee
Splunk Employee

Thanks for the tip. It turned out that even though the text editor I was using was set for UTF-8 it wasn't converting the file properly. I used a specialized converter to change it to UTF-8 and now it works fine.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...