I'm trying to index a CSV file where multiple values contains special characters such as æ, ø, å and | (vertical bar). The problem resides in characters such as these being indexed as '\xF8', '\xE6' and the like, as well as some strings having '?' inserted as the first and/or last character.
When I open the file using Notepad++ and/or Sublime Text, the special characters appear correctly. Also, in Notepad++ it writes the encoding as: UTF-8-BOM. I also tried checking the encoding with a *nix machine using the file command to which I received the result: Filename.csv: UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators.
I have tried configuring my props.conf for the input with both: - CHARSET=AUTO - CHARTSET=UTF-8 But none of these seems to solve my issue...
I have also tried exporting my CSV file as Unicode where I tried indexing with charset set to AUTO and UCS-2LE, which resulted in manyof lines being interpreted as chinese symbols.
Might someone have experienced and solved something similar?