Splunk Search

Trouble indexing special characters in UTF-8

nc_lks
Engager

Hi Splunk community!

I'm trying to index a CSV file where multiple values contains special characters such as æøå and | (vertical bar).
The problem resides in characters such as these being indexed as '\xF8', '\xE6' and the like, as well as some strings having '?' inserted as the first and/or last character.

When I open the file using Notepad++ and/or Sublime Text, the special characters appear correctly.
Also, in Notepad++ it writes the encoding as: UTF-8-BOM.
I also tried checking the encoding with a *nix machine using the file command to which I received the result:
Filename.csv: UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators.

I have tried configuring my props.conf  for the input with both:
- CHARSET=AUTO
- CHARTSET=UTF-8
But none of these seems to solve my issue...

I have also tried exporting my CSV file as Unicode where I tried indexing with charset set to AUTO and UCS-2LE, which resulted in manyof lines being interpreted as chinese symbols.

 

Might someone have experienced and solved something similar?

Labels (1)
0 Karma
1 Solution

Vardhan
Contributor

Hi @nc_lks ,

 To resolve this issue first take the data and  ingest in splunk through Add-Data option then go to advanced settings and select charset and try all encoding languages one will definitely work.

View solution in original post

0 Karma

Vardhan
Contributor

Hi @nc_lks ,

 To resolve this issue first take the data and  ingest in splunk through Add-Data option then go to advanced settings and select charset and try all encoding languages one will definitely work.

View solution in original post

0 Karma

nc_lks
Engager

Hi @Vardhan,

Thank you very much for the answer - I hadn't actually thought of that...

I found the flaw as an encoding error for a specific file at some point in the pipeline.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.