I'v already tried that. Been playing with this all day.
The case is that I am indexing a binary encoded log file... Splunk indexes all ASCII characters without a problem... but there are a few non-ASCII characters that are indexed as "\x6\xD1" which would be "Đ".
I've tried modifying the CHARSET but the only one that works is CP852, which is sadly not supported by Splunk.
As for SED I have not been able to match the pattern with a sed regex within Splunk, however when using standalone Regex tools or OSX CLI sed I match and replace the patterns without problems...
I have managed to work this out using transforms.conf with a regex and then applying that in props multiple times (ex. 10 times for a possibe 10 repetitions of the same character in 1 event). This is a very ugly workaround and I will try to find another way.
Example:
raw data: \x6\xD1\x6\xD1\x6\xD1\x6\xD1\x6\xD1
transforms.conf
[bin2text]
REGEX = (?)(.*)\x6\\xD1(.*)
FORMAT = $1Đ$2
props.conf
[sourcetype]
TRANSFORM-test = bin2text, bin2text, bin2text, bin2text, bin2text
result data: ĐĐĐĐĐ
As I said a very ugly solution but the only one I got working. I'm open to suggestions if someone has an idea...
... View more