All Apps and Add-ons

Is captured stream data transcoded to UTF8 or is there any configuration to specify character code for certain data?

melonman
Motivator

Hi,

I am playing with Splunk app for Stream, and gathering Samba information between CentOS Samba server and Windows clients.

When I created a directory in Japanese language from Windows 8.1, the captured data is garbled, some are garbled and some are not. (depends on commands of SMB)

Here is the screenshot.

alt text

I came to a question if the captured data is transcoded into UTF8 or if there is any way to specify a character code for a particular stream data.

Is there any configuration for character set for stream capture data?

Thank you in advance..

0 Karma
1 Solution

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...