<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: pre-trained source types in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/pre-trained-source-types/m-p/178043#M35702</link>
    <description>&lt;P&gt;The time format is not fixed in log4j so spunk can not assume one format. If your company has standardised on a date format, it would be good practice to add TIME_FORMAT to save splunk having to test all possibilities. &lt;BR /&gt;
In general It is good practice to use or clone splunk pre trained source types and as always the more you tell splunk, the less it has to "guess" which reduces indexing load. &lt;/P&gt;

&lt;P&gt;For ref this link shows some of the date possibilities. &lt;BR /&gt;
&lt;A href="http://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout"&gt;http://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout&lt;/A&gt;&lt;BR /&gt;
Look for  &lt;CODE&gt;date{pattern}&lt;/CODE&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 22 May 2016 10:38:43 GMT</pubDate>
    <dc:creator>bmunson_splunk</dc:creator>
    <dc:date>2016-05-22T10:38:43Z</dc:date>
    <item>
      <title>pre-trained source types</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/pre-trained-source-types/m-p/178042#M35701</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have a question regarding best practices for sourcetypes and how pre-trained sourcetypes work.&lt;/P&gt;

&lt;P&gt;I had some java logs which a member of my group was struggling with, and I suggested to him that he just use the "log4j" sourcetype.  Once that change was made, it worked fine.  I've been requiring that certain parameters be used in our sourcetypes, based upon Splunk recommendations and the Splunk "Getting Data In, Correctly" document and .conf presentation.  In that doc, they recommend:&lt;/P&gt;

&lt;P&gt;TIME_PREFIX&lt;BR /&gt;
TIME_FORMAT&lt;BR /&gt;
MAX_TIMESTAMP_LOOKAHEAD&lt;BR /&gt;
SHOULD_LINEMERGE&lt;BR /&gt;
TRUNCATE...&lt;/P&gt;

&lt;P&gt;We have been using that in all of our .props settings.  So far, so good. Since we decided to use the pre-trained log4j, I decided to see what the props settings were for that sourcetype, but executing " ./splunk btool props list log4j".   Here's the output:&lt;/P&gt;

&lt;P&gt;[log4j]&lt;BR /&gt;
ANNOTATE_PUNCT = True&lt;BR /&gt;
BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d&lt;BR /&gt;
BREAK_ONLY_BEFORE_DATE = True&lt;BR /&gt;
CHARSET = UTF-8&lt;BR /&gt;
DATETIME_CONFIG = /etc/datetime.xml&lt;BR /&gt;
HEADER_MODE = &lt;BR /&gt;
LEARN_SOURCETYPE = true&lt;BR /&gt;
LINE_BREAKER_LOOKBEHIND = 100&lt;BR /&gt;
LOOKUP-action-for_fs_notification = nix_endpoint_change_action_lookup vendor_action OUTPUT action&lt;BR /&gt;
LOOKUP-dropdowns = dropdownsLookup host OUTPUT unix_category unix_group&lt;BR /&gt;
LOOKUP-object_category-for_fs_notification = nix_endpoint_change_fs_notification_object_category_lookup vendor_object_category OUTPUTNEW object_category&lt;BR /&gt;
MAX_DAYS_AGO = 2000&lt;BR /&gt;
MAX_DAYS_HENCE = 2&lt;BR /&gt;
MAX_DIFF_SECS_AGO = 3600&lt;BR /&gt;
MAX_DIFF_SECS_HENCE = 604800&lt;BR /&gt;
MAX_EVENTS = 256&lt;BR /&gt;
MAX_TIMESTAMP_LOOKAHEAD = 128&lt;BR /&gt;
MUST_BREAK_AFTER = &lt;BR /&gt;
MUST_NOT_BREAK_AFTER = &lt;BR /&gt;
MUST_NOT_BREAK_BEFORE = &lt;BR /&gt;
SEGMENTATION = indexing&lt;BR /&gt;
SEGMENTATION-all = full&lt;BR /&gt;
SEGMENTATION-inner = inner&lt;BR /&gt;
SEGMENTATION-outer = outer&lt;BR /&gt;
SEGMENTATION-raw = none&lt;BR /&gt;
SEGMENTATION-standard = standard&lt;BR /&gt;
SHOULD_LINEMERGE = True&lt;BR /&gt;
TRANSFORMS = &lt;BR /&gt;
TRUNCATE = 10000&lt;BR /&gt;
TZ = US/Eastern&lt;BR /&gt;
detect_trailing_nulls = false&lt;BR /&gt;
maxDist = 75&lt;BR /&gt;
pulldown_type = true&lt;/P&gt;

&lt;P&gt;No TIME_PREFIX, no TIME_FORMAT, which shocks me.  Is there  a reason for this?  Am I better off using a pre-trained sourcetype? Are there performance considerations?  Inquiring minds want to know...&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 16:04:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/pre-trained-source-types/m-p/178042#M35701</guid>
      <dc:creator>a212830</dc:creator>
      <dc:date>2020-09-28T16:04:18Z</dc:date>
    </item>
    <item>
      <title>Re: pre-trained source types</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/pre-trained-source-types/m-p/178043#M35702</link>
      <description>&lt;P&gt;The time format is not fixed in log4j so spunk can not assume one format. If your company has standardised on a date format, it would be good practice to add TIME_FORMAT to save splunk having to test all possibilities. &lt;BR /&gt;
In general It is good practice to use or clone splunk pre trained source types and as always the more you tell splunk, the less it has to "guess" which reduces indexing load. &lt;/P&gt;

&lt;P&gt;For ref this link shows some of the date possibilities. &lt;BR /&gt;
&lt;A href="http://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout"&gt;http://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout&lt;/A&gt;&lt;BR /&gt;
Look for  &lt;CODE&gt;date{pattern}&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 22 May 2016 10:38:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/pre-trained-source-types/m-p/178043#M35702</guid>
      <dc:creator>bmunson_splunk</dc:creator>
      <dc:date>2016-05-22T10:38:43Z</dc:date>
    </item>
  </channel>
</rss>

