All Apps and Add-ons

Is captured stream data transcoded to UTF8 or is there any configuration to specify character code for certain data?

Motivator

Hi,

I am playing with Splunk app for Stream, and gathering Samba information between CentOS Samba server and Windows clients.

When I created a directory in Japanese language from Windows 8.1, the captured data is garbled, some are garbled and some are not. (depends on commands of SMB)

Here is the screenshot.

alt text

I came to a question if the captured data is transcoded into UTF8 or if there is any way to specify a character code for a particular stream data.

Is there any configuration for character set for stream capture data?

Thank you in advance..

0 Karma
1 Solution

Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

View solution in original post

Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

View solution in original post

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!