Hi Splunk community!
I'm trying to index a CSV file where multiple values contains special characters such as æ, ø, å and | (vertical bar).
The problem resides in characters such as these being indexed as '\xF8', '\xE6' and the like, as well as some strings having '?' inserted as the first and/or last character.
When I open the file using Notepad++ and/or Sublime Text, the special characters appear correctly.
Also, in Notepad++ it writes the encoding as: UTF-8-BOM.
I also tried checking the encoding with a *nix machine using the file command to which I received the result:
Filename.csv: UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators.
I have tried configuring my props.conf for the input with both:
- CHARSET=AUTO
- CHARTSET=UTF-8
But none of these seems to solve my issue...
I have also tried exporting my CSV file as Unicode where I tried indexing with charset set to AUTO and UCS-2LE, which resulted in manyof lines being interpreted as chinese symbols.
Might someone have experienced and solved something similar?
Hi @nc_lks ,
To resolve this issue first take the data and ingest in splunk through Add-Data option then go to advanced settings and select charset and try all encoding languages one will definitely work.
Hi @nc_lks ,
To resolve this issue first take the data and ingest in splunk through Add-Data option then go to advanced settings and select charset and try all encoding languages one will definitely work.
Hi @Vardhan,
Thank you very much for the answer - I hadn't actually thought of that...
I found the flaw as an encoding error for a specific file at some point in the pipeline.