Splunk cannot index and search Charset UTF-8 witho...

kambiu · ‎05-22-2014

I have files encoded with UTF-8 without BOM(found out in notepad++), but splunk cannot index or search the events of these file. Due to some limitation, I cannot control the encoding format of the files. Is there any support of the charset UTF-8 without BOM in splunk?

MuS · ‎05-22-2014

Hi kambiu,

Some background first: the UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify the file as an UTF-8 file.

Normally, the BOM is used to signal the endianness of the encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

According to the Unicode standard, the BOM for UTF-8 files is not recommended:

2.6 Encoding Schemes

... Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information.

If you have troubles with this source, you can add a CHARSET to the props.conf on the input of this source.
Example: if you have a universal forwarder, add it into props.conf of the universal forwarder to set a CHARSET.

hope this helps ...

cheers, MuS

kambiu · ‎05-25-2014

Thanks for your answer. I think you are right and BOM does matter with the indexing of Splunk. I have found out another way to solve that issue. Thanks 🙂

Carolina · ‎04-08-2020

Hi,

how did you solve your problem? because in my case it only indexes a part and then it is canceled

Splunk cannot index and search Charset UTF-8 without BOM

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

Splunk cannot index and search Charset UTF-8 without BOM

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...