Solved: Trouble indexing special characters in UTF-8

nc_lks · ‎03-26-2021

Hi Splunk community!

I'm trying to index a CSV file where multiple values contains special characters such as æ, ø, å and | (vertical bar).
The problem resides in characters such as these being indexed as '\xF8', '\xE6' and the like, as well as some strings having '?' inserted as the first and/or last character.

When I open the file using Notepad++ and/or Sublime Text, the special characters appear correctly.
Also, in Notepad++ it writes the encoding as: UTF-8-BOM.
I also tried checking the encoding with a *nix machine using the file command to which I received the result:
Filename.csv: UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators.

I have tried configuring my props.conf for the input with both:
- CHARSET=AUTO
- CHARTSET=UTF-8
But none of these seems to solve my issue...

I have also tried exporting my CSV file as Unicode where I tried indexing with charset set to AUTO and UCS-2LE, which resulted in manyof lines being interpreted as chinese symbols.

Might someone have experienced and solved something similar?

Vardhan · ‎03-27-2021

Hi @nc_lks ,

To resolve this issue first take the data and ingest in splunk through Add-Data option then go to advanced settings and select charset and try all encoding languages one will definitely work.

View solution in original post

Vardhan · ‎03-27-2021

Hi @nc_lks ,

To resolve this issue first take the data and ingest in splunk through Add-Data option then go to advanced settings and select charset and try all encoding languages one will definitely work.

nc_lks · ‎03-29-2021

Hi @Vardhan,

Thank you very much for the answer - I hadn't actually thought of that...

I found the flaw as an encoding error for a specific file at some point in the pipeline.

Trouble indexing special characters in UTF-8

Splunk Observability for AI

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

Are you a member of the Splunk Community?

Trouble indexing special characters in UTF-8

Splunk Observability for AI

[Puzzles] Solve, Learn, Repeat: Dereferencing XML to Fixed-length events

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!