All Apps and Add-ons

Is captured stream data transcoded to UTF8 or is there any configuration to specify character code for certain data?

melonman
Motivator

Hi,

I am playing with Splunk app for Stream, and gathering Samba information between CentOS Samba server and Windows clients.

When I created a directory in Japanese language from Windows 8.1, the captured data is garbled, some are garbled and some are not. (depends on commands of SMB)

Here is the screenshot.

alt text

I came to a question if the captured data is transcoded into UTF8 or if there is any way to specify a character code for a particular stream data.

Is there any configuration for character set for stream capture data?

Thank you in advance..

0 Karma
1 Solution

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

Get Updates on the Splunk Community!

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...