I am setting up an app for a financial customer in Korea. They are using a standardized business reporting language that is all in English. I've indexed it and extracted all the necessary fields, but the customer needs the terms of some fields to be translated into Korean. I've created a csv file with all of the English terms and their Korean equivalents and put it under system>lookups, but when I restart Splunk and run the search on that sourcetype the following message comes up:
[EventsViewer module] Input is not proper UTF-8, indicate encoding ! Bytes: 0xB1 0xB8 0xBA 0xD0, line 59, column 8
I tried editing the charset in props.conf, and I've changed the format of the file to UTF-8 and even tried using the Korean character set that splunk supports, but I still get the same message. Does Splunk not support foreign character lookups? or am I missing something in my configurations?
The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.
Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.
The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.
Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.
Thanks for the tip. It turned out that even though the text editor I was using was set for UTF-8 it wasn't converting the file properly. I used a specialized converter to change it to UTF-8 and now it works fine.