I have a utf-16 CSV file with a 0xFFFE byte order mark and the csv field names in the first line.
I have defined the charset for that input type to be utf-16le, which is fine, however, it extracts the field names incorrectly, e.g. the field name 'Company' is shown as
x00C_x00o_x00m_x00p_x00a_x00n_x00y_x00
I have tried various ways to fix this,
Firstly, skipping the first line using the following in props.conf
PREAMBLE_REGEX = \ufffe
FIELD_NAMES = id,username,firstname,lastname,company,time,ipaddress
but I then get no named fields at all and the first line is an indexed record.
Secondly, trying FIELD_HEADER_REGEX, also no luck
Edit: I also tried the preamble_REGEX as \ufeff, I also removed the BOM but still it creates the incorrectly decoded utf16 field names.
I converted the file to utf-8 and it's fine, but that's not a practical solution in the live environment.
Anyone got utf-16 + csv working?
The fix for the SPL-78590 has been released in 6.0.2+. Thank you.
I am having the same problem. Did you ever find a solution?
Thanks,
Joe
I opened a case with Splunk. Bug SPL-78590 has been logged regarding this issue. It seems like the field extraction is not taking into the consideration regarding CHARSET specified.
No Splunk solution, but I converted the files to utf-8, depending on your platform, I used
iconv -f utf16-le -t utf-8
to convert the file.
Yes, values are fine, just the header names are wrong.
Just want to be certain, the field values are decoded okay, but the field names are not, is that correct?