In our WebSphere environment we successfully indexes all SystemOut and SystemErr.log files except for one single cluster and its members. Problem is that one of the applications logs to SystemOut output from CICS which I have been told is encoded using ebcdic encoding. Therefore Splunk rejects the file with the following messages
02-12-2014 10:46:26.505 +0100 INFO TailingProcessor - Ignoring file 'E:\logs\MyCluster\SystemOut.log' due to: binary
02-12-2014 10:46:26.505 +0100 WARN FileClassifierManager - The file 'E:\logs\MyCluster\SystemOut.log' is invalid. Reason: binary
For the deployment app defining this file I have created a props.conf file in the folder
D:\Splunk\etc\deployment-apps\inputs_prod\default
I have tried all below without success
[source::E:\\logs\\MyCluster\\SystemOut.log]
CHARSET = utf-ebcdic
#CHARSET = auto
#NO_BINARY_CHECK = 1
First
I am not totally sure that the location of the props.conf is correct, but I do believe so.
Secondly
Without really diving into the details and changing the application and how it logs, is it possible to configure Splunk to index the file?
To my knowledge, Splunk cannot index a binary file, however the data from the file can be indexed once it is in a non binary format. There are two approaches you could take:
I've done a bit of EBCDIC in my time 🙂
You will need to decode the EBCDIC and encode in ASCII.
You might do this in a scripted input or modular input or pre-process the EBCDIC content before sending to Splunk.
The decoding is trivial in python :
ebcdic_str = '\xc8\xc5\xd3\xd3\xd6'
print ebcdic_str.decode('EBCDIC-CP-BE').encode('ascii')
#prints out HELLO
To my knowledge, Splunk cannot index a binary file, however the data from the file can be indexed once it is in a non binary format. There are two approaches you could take: