All Apps and Add-ons

Is captured stream data transcoded to UTF8 or is there any configuration to specify character code for certain data?

melonman
Motivator

Hi,

I am playing with Splunk app for Stream, and gathering Samba information between CentOS Samba server and Windows clients.

When I created a directory in Japanese language from Windows 8.1, the captured data is garbled, some are garbled and some are not. (depends on commands of SMB)

Here is the screenshot.

alt text

I came to a question if the captured data is transcoded into UTF8 or if there is any way to specify a character code for a particular stream data.

Is there any configuration for character set for stream capture data?

Thank you in advance..

0 Karma
1 Solution

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...