pre-trained source types

a212830 — Mon, 28 Sep 2020 16:04:18 GMT

Hi,

I have a question regarding best practices for sourcetypes and how pre-trained sourcetypes work.

I had some java logs which a member of my group was struggling with, and I suggested to him that he just use the "log4j" sourcetype. Once that change was made, it worked fine. I've been requiring that certain parameters be used in our sourcetypes, based upon Splunk recommendations and the Splunk "Getting Data In, Correctly" document and .conf presentation. In that doc, they recommend:

TIME_PREFIX
TIME_FORMAT
MAX_TIMESTAMP_LOOKAHEAD
SHOULD_LINEMERGE
TRUNCATE...

We have been using that in all of our .props settings. So far, so good. Since we decided to use the pre-trained log4j, I decided to see what the props settings were for that sourcetype, but executing " ./splunk btool props list log4j". Here's the output:

[log4j]
ANNOTATE_PUNCT = True
BREAK_ONLY_BEFORE = \d\d?:\d\d:\d\d
BREAK_ONLY_BEFORE_DATE = True
CHARSET = UTF-8
DATETIME_CONFIG = /etc/datetime.xml
HEADER_MODE =
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
LOOKUP-action-for_fs_notification = nix_endpoint_change_action_lookup vendor_action OUTPUT action
LOOKUP-dropdowns = dropdownsLookup host OUTPUT unix_category unix_group
LOOKUP-object_category-for_fs_notification = nix_endpoint_change_fs_notification_object_category_lookup vendor_object_category OUTPUTNEW object_category
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 2
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 604800
MAX_EVENTS = 256
MAX_TIMESTAMP_LOOKAHEAD = 128
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = True
TRANSFORMS =
TRUNCATE = 10000
TZ = US/Eastern
detect_trailing_nulls = false
maxDist = 75
pulldown_type = true

No TIME_PREFIX, no TIME_FORMAT, which shocks me. Is there a reason for this? Am I better off using a pre-trained sourcetype? Are there performance considerations? Inquiring minds want to know...

Re: pre-trained source types

bmunson_splunk — Sun, 22 May 2016 10:38:43 GMT

The time format is not fixed in log4j so spunk can not assume one format. If your company has standardised on a date format, it would be good practice to add TIME_FORMAT to save splunk having to test all possibilities.
In general It is good practice to use or clone splunk pre trained source types and as always the more you tell splunk, the less it has to "guess" which reduces indexing load.

For ref this link shows some of the date possibilities.
http://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout
Look for date{pattern}

topic Re: pre-trained source types in Getting Data In

pre-trained source types

Re: pre-trained source types