Splunk Search

Splunk cannot index and search Charset UTF-8 without BOM

kambiu
New Member

I have files encoded with UTF-8 without BOM(found out in notepad++), but splunk cannot index or search the events of these file. Due to some limitation, I cannot control the encoding format of the files. Is there any support of the charset UTF-8 without BOM in splunk?

Tags (3)
0 Karma

MuS
SplunkTrust
SplunkTrust

Hi kambiu,

Some background first: the UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify the file as an UTF-8 file.

Normally, the BOM is used to signal the endianness of the encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

According to the Unicode standard, the BOM for UTF-8 files is not recommended:

2.6 Encoding Schemes

... Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 16.8, Specials, for more information.

If you have troubles with this source, you can add a CHARSET to the props.conf on the input of this source.
Example: if you have a universal forwarder, add it into props.conf of the universal forwarder to set a CHARSET.

hope this helps ...

cheers, MuS

0 Karma

kambiu
New Member

Thanks for your answer. I think you are right and BOM does matter with the indexing of Splunk. I have found out another way to solve that issue. Thanks 🙂

0 Karma

Carolina
Engager

Hi,

how did you solve your problem? because in my case it only indexes a part and then it is canceled

0 Karma
Get Updates on the Splunk Community!

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...

Auto-Injector for Everything Else: Making OpenTelemetry Truly Universal

You might have seen Splunk’s recent announcement about donating the OpenTelemetry Injector to the ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...