Hello once again. Working with a distributed environment (Universal Forwarder > Heavy Forwarder > Indexer) I have a particular log file that writes a timestamp every line, though the "event" itself is actually a group of these lines that can be delimited by a specific line that appears. Here is what a couple events in the log file look like:
[2016-04-11T01:00:10][D00034][T00041][Info] ------------------[7.3.1.18]-----------------------
[2016-04-11T01:00:10][D00034][T00041][Info] Processing...
[2016-04-11T01:00:10][D00034][T00041][Info] Retries disabled
[2016-04-11T01:00:10][D00034][T00041][Info] Company: xxxxxxxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] Filename: xxxxxxxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] RenderFormat: CSV
[2016-04-11T01:00:10][D00034][T00041][Info] domain_id: 2
[2016-04-11T01:00:10][D00034][T00041][Info] user_id: 27090
[2016-04-11T01:00:10][D00034][T00041][Info] template_id: 2100
[2016-04-11T01:00:10][D00034][T00041][Info] rdl: /xxxxxx/xxxxxxxxxxxx/xxxxxxxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] schedule_id: 1096
[2016-04-11T01:00:10][D00034][T00041][Info] lang_id: en_US
[2016-04-11T01:00:10][D00034][T00041][Info] result_name: xxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] result_description: xxxxxxx
[2016-04-11T01:00:10][D00034][T00041][Info] queued_result_id: 0
[2016-04-11T01:00:10][D00034][T00041][Info] metrics_smdb_key: 0
[2016-04-11T01:00:10][D00034][T00041][Info] schedule_type: 0
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded. Result ID=18226
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx is: disabled
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Report (CSV)
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Complete
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded (1 rows)
[2016-04-11T01:06:10][D00034][T00041][Info] Post processing CSV...
[2016-04-11T01:06:10][D00034][T00041][Info] templateName=xxxxxxxxxxxxr
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded (16 rows)
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : succeeded (63 rows)
[2016-04-11T01:06:10][D00034][T00041][Info] Metadata updated with user stat display names: 16
[2016-04-11T01:06:10][D00034][T00041][Info] Collating 'required' fields
[2016-04-11T01:06:10][D00034][T00041][Info] Collating 20 custom fields
[2016-04-11T01:06:10][D00034][T00041][Info] Creating processed csv stream (Unicode (UTF-8))
[2016-04-11T01:06:10][D00034][T00041][Info] CSV Processing Complete
[2016-04-11T01:06:11][D00034][T00041][Info] checkDirectoryExists: Full path found (\\xxxxxxxxxxxx\)
[2016-04-11T01:06:11][D00034][T00041][Info] createUniqueFile: Created file: xxxxxxxxxxxx.csv
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx: Report created: \\xxxxxxxxxxxx\xxxxxxxxxxxx\xxxxxxxxxxxx.csv
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx: File size: 317 bytes
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:06:11][D00034][T00041][Info] DeliverToEmail: Email sent
[2016-04-11T01:06:11][D00034][T00041][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:06:11][D00034][T00041][Info] Completed
[2016-04-11T01:06:11][D00034][T00041][Info] Exiting: true
[2016-04-11T01:46:08][D00034][T00038][Info] ------------------[7.3.1.18]-----------------------
[2016-04-11T01:46:08][D00034][T00038][Info] Processing...
[2016-04-11T01:46:08][D00034][T00038][Info] Retries disabled
[2016-04-11T01:46:08][D00034][T00038][Info] Company: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] Filename: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] RenderFormat: CSV
[2016-04-11T01:46:08][D00034][T00038][Info] domain_id: 2
[2016-04-11T01:46:08][D00034][T00038][Info] user_id: 3599
[2016-04-11T01:46:08][D00034][T00038][Info] template_id: 5051
[2016-04-11T01:46:08][D00034][T00038][Info] rdl: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] schedule_id: 4211
[2016-04-11T01:46:08][D00034][T00038][Info] lang_id: en_US
[2016-04-11T01:46:08][D00034][T00038][Info] result_name: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] result_description: xxxxxxxxxxxx
[2016-04-11T01:46:08][D00034][T00038][Info] queued_result_id: 5541
[2016-04-11T01:46:08][D00034][T00038][Info] metrics_smdb_key: 0
[2016-04-11T01:46:08][D00034][T00038][Info] schedule_type: 3
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx is: disabled
[2016-04-11T01:46:08][D00034][T00038][Info] Verifying Queued result exists: ResultID=5541
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:08][D00034][T00038][Info] xxxxxxxxxxxx : Rendering Report (CSV)
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : Rendering Complete
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded (1 rows)
[2016-04-11T01:46:36][D00034][T00038][Info] Post processing CSV...
[2016-04-11T01:46:36][D00034][T00038][Info] templateName=xxxxxxxxxxxx
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded (0 rows)
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded (14 rows)
[2016-04-11T01:46:36][D00034][T00038][Info] User stat fields not required
[2016-04-11T01:46:36][D00034][T00038][Info] Collating 'required' fields
[2016-04-11T01:46:36][D00034][T00038][Info] Collating 9 custom fields
[2016-04-11T01:46:36][D00034][T00038][Info] Creating processed csv stream (Unicode (UTF-8))
[2016-04-11T01:46:36][D00034][T00038][Info] CSV Processing Complete
[2016-04-11T01:46:36][D00034][T00038][Info] checkDirectoryExists: Full path found (\\xxxxxxxxxxxx\xxxxxxxxxxxx\)
[2016-04-11T01:46:36][D00034][T00038][Info] createUniqueFile: Created file: xxxxxxxxxxxx.csv
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx: Report created: \\xxxxxxxxxxxx\xxxxxxxxxxxx.csv
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx: File size: 14,833,025 bytes
[2016-04-11T01:46:36][D00034][T00038][Info] xxxxxxxxxxxx : succeeded
[2016-04-11T01:46:36][D00034][T00038][Info] Completed
[2016-04-11T01:46:36][D00034][T00038][Info] Exiting: true
The start of every event should be this line:
[2016-04-11T01:00:10][D00034][T00041][Info] ------------------[7.3.1.18]-----------------------
To accomplish this goal, I've set the following configuration for the sourcetype on the heavy forwarder in props.conf:
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=^(\[\d+\-\d+\-\d+\w+:\d+:\d+\]\[\w+\]\[\w+]\[\w+\].\-{18}\[.*)
BREAK_ONLY_BEFORE_DATE=false
pulldown_type=1
category=Custom
MAX_DIFF_SECS_HENCE=2147483646
MAX_DAYS_HENCE=7
MAX_EVENTS=1024
TRUNCATE=10000
MAX_TIMESTAMP_LOOKAHEAD=22
This is only half working... it does in fact always break on this line:
[2016-04-11T01:00:10][D00034][T00041][Info] ------------------[7.3.1.18]-----------------------
But it is also breaking on -these- lines:
[2016-04-11T01:00:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Report (CSV)
[2016-04-11T01:06:10][D00034][T00041][Info] xxxxxxxxxxxx : Rendering Complete
If the timestamps are close enough together, it will not break. If they a difference in time apart, it will break, regardless of whether the line meets the BREAK_ONLY_BEFORE regex. I ran a props list through the btool on the heavy forwarder... these are the settings being picked up:
ANNOTATE_PUNCT = True
AUTO_KV_JSON = true
BREAK_ONLY_BEFORE = ^(\[\d+\-\d+\-\d+\w+:\d+:\d+\]\[\w+\]\[\w+]\[\w+\].\-{18}\[.*)
BREAK_ONLY_BEFORE_DATE = false
CHARSET = AUTO
DATETIME_CONFIG = \etc\datetime.xml
HEADER_MODE =
LEARN_SOURCETYPE = true
LINE_BREAKER_LOOKBEHIND = 100
MAX_DAYS_AGO = 2000
MAX_DAYS_HENCE = 7
MAX_DIFF_SECS_AGO = 3600
MAX_DIFF_SECS_HENCE = 2147483646
MAX_EVENTS = 1024
MAX_TIMESTAMP_LOOKAHEAD = 22
MUST_BREAK_AFTER =
MUST_NOT_BREAK_AFTER =
MUST_NOT_BREAK_BEFORE =
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD_LINEMERGE = true
TRANSFORMS =
TRUNCATE = 100000
category = Custom
detect_trailing_nulls = auto
maxDist = 100
priority =
pulldown_type = 1
sourcetype =
Any idea why these events are still breaking in the middle?
I just had a thought that maybe someone would like to weigh in on... no matter how I configure the props.conf, the events, for the most part, seem to always break the same way. I can't find any rhyme or reason to it. I've looked at my regex, thinking maybe it was too greedy... I changed it to something more limited:
([\d+-\d+-\d+\w+:\d+:\d+][\w+][\w+][\w+].-{18})
I've even taken the regex out and changed several other settings... while this affects things in a minor way, the breaking actually works more or less the same. Thinking it was timestamp related, I set DATETIME_CONFIG to NONE... and it still doesn't work (though I can see the setting is making the data offset with Splunk's indexed time by a a few seconds).
How is this for a theory... it occurred to me that maybe it's not the timestamps causing the issue but the timestamps are more indicative of when the file has been opened/closed by the app writing the log... what if the app is constantly writing it "End of File" everytime it adds a block of log data? How would the Splunk heavy forwarder handle this information? How would it join these events, if it keeps seeing an end of file for the log? Just a theory... that maybe this is getting broken in some way that isn't related at all to the property settings?
Give this a try (props.conf on heavy forwarder)
[yoursourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\[\d+\-\d+\-\d+\w+:\d+:\d+\]\[\w+\]\[\w+]\[\w+\].-{18}
TIME_PREFIX=^\[
TIME_FORMAT=%Y-%m-%dT%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD=19
pulldown_type=1
category=Custom
MAX_DIFF_SECS_HENCE=2147483646
MAX_DAYS_HENCE=7
MAX_EVENTS=1024
TRUNCATE=10000
This didn't work unfortunately... still looks like the attached.
Something that might be interesting though is that I don't have this problem if I manually upload log data onto the indexer to test it with the parameters I've been using. It seems to break the events correctly... so maybe this is some kind of bug?