I am indexing some logs and I see some events are filled with "\x00" while some other events are indexed correctly.
Such behavior is observed when Unix Splunk instance is indexing mounted Windows logs. Windows has a unique way of logging which Unix instance is not aware of causing to index with nulls "\x00". When indexing Windows logs such as iis, exchange, domain controller, and so on, install Splunk as a forwarder on the Windows box and have it forward to Unix Splunk indexer.
Nope. UTF-16 turned a lot of the text into asian characters.
I'm seeing an input from windows onto unix having \x00 and some other unknown characters interspersed with the data. Is UTF-16 the answer to this?
Ideally, if you are having this symptom, you should be clear about the pattern of null characters (\x00). Are they interleaved with the expected data? Do they come in small bursts (ten bytes or so)? Are there several kilobytes worth of nulls all at once?
This also happens sometimes when the data your indexing is encoded with a different character set such as UTF-16
You can specify the charset in your props.conf
[somesourcetype]
CHARSET = UTF-16
So, do we create props.conf on Forwarder TA or Indexer TA?
Such behavior is observed when Unix Splunk instance is indexing mounted Windows logs. Windows has a unique way of logging which Unix instance is not aware of causing to index with nulls "\x00". When indexing Windows logs such as iis, exchange, domain controller, and so on, install Splunk as a forwarder on the Windows box and have it forward to Unix Splunk indexer.