Splunk Search

Splunk cannot index and search Charset UTF-8 without BOM

kambiu
New Member

I have files encoded with UTF-8 without BOM(found out in notepad++), but splunk cannot index or search the events of these file. Due to some limitation, I cannot control the encoding format of the files. Is there any support of the charset UTF-8 without BOM in splunk?

Tags (3)
0 Karma

MuS
SplunkTrust
SplunkTrust

Hi kambiu,

Some background first: the UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify the file as an UTF-8 file.

Normally, the BOM is used to signal the endianness of the encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

According to the Unicode standard, the BOM for UTF-8 files is not recommended:

2.6 Encoding Schemes

... Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information.

If you have troubles with this source, you can add a CHARSET to the props.conf on the input of this source.
Example: if you have a universal forwarder, add it into props.conf of the universal forwarder to set a CHARSET.

hope this helps ...

cheers, MuS

0 Karma

kambiu
New Member

Thanks for your answer. I think you are right and BOM does matter with the indexing of Splunk. I have found out another way to solve that issue. Thanks 🙂

0 Karma

Carolina
Engager

Hi,

how did you solve your problem? because in my case it only indexes a part and then it is canceled

0 Karma
Get Updates on the Splunk Community!

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

  Ready to master Kubernetes and cloud monitoring like the pros?Join Splunk’s Growth Engineering team for an ...

Wrapping Up Cybersecurity Awareness Month

October might be wrapping up, but for Splunk Education, cybersecurity awareness never goes out of season. ...

🌟 From Audit Chaos to Clarity: Welcoming Audit Trail v2

🗣 You Spoke, We Listened  Audit Trail v2 wasn’t written in isolation—it was shaped by your voices.  In ...