<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: International character code recognition in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42334#M7869</link>
    <description>&lt;P&gt;It guesses based on default trainings it has been given: &lt;A href="http://www.splunk.com/base/Documentation/4.1.4/Admin/Configurecharactersetencoding#If_Splunk_doesn.27t_recognize_a_character_set"&gt;http://www.splunk.com/base/Documentation/4.1.4/Admin/Configurecharactersetencoding#If_Splunk_doesn.27t_recognize_a_character_set&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 06 Sep 2010 07:46:24 GMT</pubDate>
    <dc:creator>gkanapathy</dc:creator>
    <dc:date>2010-09-06T07:46:24Z</dc:date>
    <item>
      <title>International character code recognition</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42331#M7866</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;

&lt;P&gt;I would like to know how to handle international character code in Splunk. &lt;/P&gt;

&lt;P&gt;The environment I have here is in Japanese. There are 3 character codes available for Japanese representation, SJIS, EUC and Unicode.&lt;/P&gt;

&lt;P&gt;My question is, how Splunk detects the character code of the input, manage the character in index, and handle the characters during search operation.&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 02 Sep 2010 12:08:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42331#M7866</guid>
      <dc:creator>melonman</dc:creator>
      <dc:date>2010-09-02T12:08:07Z</dc:date>
    </item>
    <item>
      <title>Re: International character code recognition</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42332#M7867</link>
      <description>&lt;P&gt;Splunk tries to auto-detect the character set of an input. It may get it wrong. You can specify the character set of an input by setting &lt;CODE&gt;CHARSET&lt;/CODE&gt; in the &lt;CODE&gt;props.conf&lt;/CODE&gt; on the input node (i.e., the same server where the inputs.conf is configured), I would recommend specifying it in a &lt;CODE&gt;[source::...]&lt;/CODE&gt; stanza in that file.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.splunk.com/base/Documentation/4.1.4/Admin/Configurecharactersetencoding" rel="nofollow"&gt;http://www.splunk.com/base/Documentation/4.1.4/Admin/Configurecharactersetencoding&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Sep 2010 23:50:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42332#M7867</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-09-02T23:50:41Z</dc:date>
    </item>
    <item>
      <title>Re: International character code recognition</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42333#M7868</link>
      <description>&lt;P&gt;I understand the encoding configuration, but I can't find the explanation how splunk does auto-detect the character set of an input. Is there any information about auto-detect the character set of an input?&lt;/P&gt;</description>
      <pubDate>Fri, 03 Sep 2010 08:18:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42333#M7868</guid>
      <dc:creator>melonman</dc:creator>
      <dc:date>2010-09-03T08:18:02Z</dc:date>
    </item>
    <item>
      <title>Re: International character code recognition</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42334#M7869</link>
      <description>&lt;P&gt;It guesses based on default trainings it has been given: &lt;A href="http://www.splunk.com/base/Documentation/4.1.4/Admin/Configurecharactersetencoding#If_Splunk_doesn.27t_recognize_a_character_set"&gt;http://www.splunk.com/base/Documentation/4.1.4/Admin/Configurecharactersetencoding#If_Splunk_doesn.27t_recognize_a_character_set&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Sep 2010 07:46:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42334#M7869</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-09-06T07:46:24Z</dc:date>
    </item>
    <item>
      <title>Re: International character code recognition</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42335#M7870</link>
      <description>&lt;P&gt;Thank you for charset related information! well, what I want to know is more like mechanism how Splunk detects char code. Like html or other format, usually charactor codes are specified in the header of the data itself. However, most of case, IT data is simple text and you don't usually know which char code the text is written in. The doc says that splunk does auto-detection, and I want to know how.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Sep 2010 09:15:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42335#M7870</guid>
      <dc:creator>melonman</dc:creator>
      <dc:date>2010-09-09T09:15:18Z</dc:date>
    </item>
    <item>
      <title>Re: International character code recognition</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42336#M7871</link>
      <description>&lt;P&gt;I will simplify the question. Does splunk use universal encoding detector to detect character encoding of inputs?&lt;/P&gt;</description>
      <pubDate>Tue, 14 Sep 2010 10:30:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42336#M7871</guid>
      <dc:creator>melonman</dc:creator>
      <dc:date>2010-09-14T10:30:40Z</dc:date>
    </item>
    <item>
      <title>Re: International character code recognition</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42337#M7872</link>
      <description>&lt;P&gt;No. As noted, it uses heuristics based on the training set. As you noted, for log files, there will rarely if ever be encoding indicators, and they will be unreliable.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Sep 2010 11:10:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/International-character-code-recognition/m-p/42337#M7872</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-09-14T11:10:25Z</dc:date>
    </item>
  </channel>
</rss>

