Splunk Search

Lookup files with foreign characters

gpburgett
Splunk Employee
Splunk Employee

I am setting up an app for a financial customer in Korea. They are using a standardized business reporting language that is all in English. I've indexed it and extracted all the necessary fields, but the customer needs the terms of some fields to be translated into Korean. I've created a csv file with all of the English terms and their Korean equivalents and put it under system>lookups, but when I restart Splunk and run the search on that sourcetype the following message comes up:

[EventsViewer module] Input is not proper UTF-8, indicate encoding ! Bytes: 0xB1 0xB8 0xBA 0xD0, line 59, column 8

I tried editing the charset in props.conf, and I've changed the format of the file to UTF-8 and even tried using the Korean character set that splunk supports, but I still get the same message. Does Splunk not support foreign character lookups? or am I missing something in my configurations?

Tags (2)
1 Solution

mitch_1
Splunk Employee
Splunk Employee

The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.

Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.

View solution in original post

mitch_1
Splunk Employee
Splunk Employee

The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.

Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.

gpburgett
Splunk Employee
Splunk Employee

Thanks for the tip. It turned out that even though the text editor I was using was set for UTF-8 it wasn't converting the file properly. I used a specialized converter to change it to UTF-8 and now it works fine.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...