Getting Data In

Seing null (\x00) bytes in indexed data from log file in Windows

smwirt
Path Finder

I have seen several questions regarding null (\x00) bytes in data, but none have helped me resolve my issue so far.

I am trying to read a log file from Sophos using Universal Forwarders. I have done the following so far:

Added a new sourcetype in Splunk Web.

props.conf on the indexer:

[my_sourcetype]
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y%m%d %H%M%S
TZ = UTC
pulldown_type = 1
CHARSET = UTF-16LE

Modified inputs.conf on the forwarders:

[monitor://C:\ProgramData\Sophos\Sophos Device Control\logs]
sourcetype=my_sourcetype

Sample data from C:\ProgramData\Sophos\Sophos Device Control\logs\DeviceControl.txt:

20131001 150737 Device control has started on this machine.
20131003 131815 Device control has started on this machine.

When I search sourcetype="my_sourcetype", I see data, but it looks like this:

\x002\x000\x001\x003\x00 \x001\x005\x000\x007\x003\x007....

If I copy that data into Notepad, replace \x00 with nothing, then I see the data that I expect.

Before I left tonight, I noticed that the text file I am reading from is blue in Windows Explorer, which indicates the compression bit is set. Every file in this folder is set this way, and removing compression is not an option.

What do I need to do in order to have Splunk index the data without null values? All other data coming from TA-Windows and other apps is fine and does not show null values.

Update 10/17/13:

Wanted to clarify that this is Splunk 4.3.3 on Windows Server 2008 R2 SP1, with Windows 7 SP1 x64 hosts running the Universal Forwarder. Upgrading Splunk is not an option at this time, but we are pushing to do so in the near future.

/etc/system/local/outputs.conf on the forwarder:

[tcpout]
defaultGroup = 1.2.3.4_9997

[tcpout:1.2.3.4_9997]
server = 1.2.3.4:9997

[tcpout-server://1.2.3.4:9997]

/etc/system/local/inputs.conf on the indexer:

[default]
host = my_hostname

[script://$SPLUNK_HOME\bin\scripts\splunk-admon.path]
disabled = 0

[script://$SPLUNK_HOME\bin\scripts\splunk-perfmon.path]
disabled = 0

.... (two more script stanzas)

[monitor://C:\ProgramData\Sophos\Sophos Device Control\logs]
sourcetype=my_sourcetype

Again, all other data coming from the forwarders looks fine without null bytes. Only the data from Sophos is an issue. I am also noticing entries in Splunk with just a single null character as the data (\x00).

0 Karma
1 Solution

smwirt
Path Finder

Issue resolved for now: had to set CHARSET = UTF16-LE on props.conf on the forwarders as well as the indexer. I was mistakenly putting the CHARSET line into inputs.conf on the forwarders.

View solution in original post

smwirt
Path Finder

Issue resolved for now: had to set CHARSET = UTF16-LE on props.conf on the forwarders as well as the indexer. I was mistakenly putting the CHARSET line into inputs.conf on the forwarders.

kellycocat
Explorer

This also worked for my case, exactly the same issue as you described. It was key to put the props.conf with the CHARSET on both the UF and the indexer, otherwise, it didn't work.

0 Karma

sshres5
Communicator

So, do we need to install UTF16-LE on the indexer server to decode it. My server only has UTF-8.

0 Karma

smwirt
Path Finder

I followed the instructions here (http://answers.splunk.com/answers/83790/how-do-i-remove-x00-characters-from-my-log-message) to remove nulls before indexing (edited props.conf) and the data looks normal now, but for at least one of the hosts so far the timestamp is incorrect. The entry in Splunk that now looks correct has the same timestamp as the previous entries that had \x00 bytes. For another host it is correctly parsing the timestamp from the data.

I'd rather it be correctly processed up front instead of replacing nulls, but if I can get the timestamp correct I can live with it.

0 Karma

smwirt
Path Finder

Just noticed this line in splunkd.log on the indexer:

WARN  UTF8Processor - Using charset UTF-8 for events from 'UTF-16LE', as the monitor is believed over the raw text which may be source:C:\ProgramData\Sophos\Sophos Device Control\logs\DeviceControl.txt|host::my_host|my_sourcetype|remoteport::56789
0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...