I have a log file that I need to import into Splunk and I want to get it as efficient as possible, as there is a LOT of data (Gigs per day) to import and I understand that Splunk uses a lot of regex processes on each event to work out the sourcetype if you don't manually assign the sourcetype.
I use the data import tool to look at the data (sample below) and it seems to pick up that the date is split (yes, this date below is 2017-04-12 12:24:49.486000) and even splits the events correctly, but it does not declare what settings it is using to do the work within the UI. I do look into the Advanced Settings and it shows the usual SHOULD_LINEMERGE, CHARSET, etc, but it does not show the time format, as it is set to "Auto", which I hear is ok for demos, but a big no-no for production.
12/04/2017 MSI (c) (98:0C) [11:24:49:486]: Resetting cached policy values
I have checked the /system/default/props.conf and /etc/datetime.xml and the props.conf stanza for the new sourcetype, but this only gives the WAY that my sourcetype is default configured and it does not indicate what eventual Splunk configuration the system is using. When I used btool to list the configuration for the stanza, using this command line:
/opt/splunk/bin/splunk btool props list msi_install_logs
I got this:
[msi_install_logs] ANNOTATE_PUNCT = True AUTO_KV_JSON = true BREAK_ONLY_BEFORE = BREAK_ONLY_BEFORE_DATE = True CHARSET = UTF-8 DATETIME_CONFIG = HEADER_MODE = LEARN_SOURCETYPE = true LINE_BREAKER_LOOKBEHIND = 100 MAX_DAYS_AGO = 2000 MAX_DAYS_HENCE = 2 MAX_DIFF_SECS_AGO = 3600 MAX_DIFF_SECS_HENCE = 604800 MAX_EVENTS = 256 MAX_TIMESTAMP_LOOKAHEAD = 128 MUST_BREAK_AFTER = MUST_NOT_BREAK_AFTER = MUST_NOT_BREAK_BEFORE = NO_BINARY_CHECK = true SEGMENTATION = indexing SEGMENTATION-all = full SEGMENTATION-inner = inner SEGMENTATION-outer = outer SEGMENTATION-raw = none SEGMENTATION-standard = standard SHOULD_LINEMERGE = True TRANSFORMS = TRUNCATE = 10000 category = Custom detect_trailing_nulls = false disabled = false maxDist = 100 priority = pulldown_type = true sourcetype =
And annoyingly, the system does not itemise the configurations above.
Does anyone know how I can look into the import process and find out what settings for the TIMESTAMP and LINE_BREAKER ARE BEING USED, etc?
I am not positive I understand the question but I'll give it a whirl. It sounds like you configured a sourcetype and the time stamping is set to auto. You want to understand what auto does or what it is using? It's not a specific settings which can be seen via btool. It's applying an "under the hood" logic to determine the time stamp.
If you do not wish to use "auto" but instead have it configured, you could go through a similar process of importing the data in the gui, and this time when configuring the time stamping, select "Advanced" instead of auto.
The following document has guidance on the configuration options and provide several examples:
/opt/splunk/bin/splunk cmd btool props list msi_install_logs --debug