Having an issue with line breaking at the time stamp for a particular sourcetype.
RAW
2013-03-13T15:32:52.247-0700: 103395.597: [Full GC (System) [PSYoungGen: 192K->0K(20160K)] [ParOldGen: 17487K->16257K(43712K)] 17679K->16257K(63872K) [PSPermGen: 28027K->28027K(47232K)], 0.5712670 secs] [Times: user=0.16 sys=0.00, real=0.57 secs]
Splunk Parsed:
2013-03-14T08:50:15.353-0700: 63009.133: [GC
Desired survivor size 25559040 bytes, new threshold 1 (max 15)
[PSYoungGen: 53440K->21216K(56960K)] 92645K->60485K(122496K), 0.4307669 secs]
[Times: user=2.27 sys=0.03, real=0.43 secs]
---> this is a new event, should be merged with the line above.
2013-03-14T13:37:19.653-0700: 80232.893: [GC
Desired survivor size 28311552 bytes, new threshold 1 (max 15)
[PSYoungGen: 56544K->21216K(60288K)] 95813K->60549K(125824K), 0.4341336 secs]
props.conf
[iccsgclog]
SHOULD_LINEMERGE = true
TRANSFORMS-iccslogs = iccs-fields
REPORT-iccs = slc_details, slc_fields, slc_taxon
MAX_TIMESTAMP_LOOKAHEAD = 40
BREAK_ONLY_BEFORE = \d+-\d+-\d+\w\d+:\d+:\d+.\d+-\d+:
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
What am I missing??
in addition to what emiller42 says, use this instead for slightly better results:
[iccsgclog]
# SHOULD_LINEMERGE = true
SHOULD_LINEMERGE = false
LINE_BREAKER=([\r\n]+)(?=\d+-\d+-\d+\w\d+:\d+:\d+.\d+-\d+:)
TRANSFORMS-iccslogs = iccs-fields
REPORT-iccs = slc_details, slc_fields, slc_taxon
MAX_TIMESTAMP_LOOKAHEAD = 40
# BREAK_ONLY_BEFORE = \d+-\d+-\d+\w\d+:\d+:\d+.\d+-\d+:
# TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%z
The problem is that gc log lines don't buffer, so there's actually often a significant delay between when the first part of the event is written and when the second part is written. (It can actually split into more chunks on abnormally long collections) The default time splunk waits before it starts considering something a new event is three seconds. There is a config for that, and since your gc logs actually have a timestamp (mine don't) this may help:
In your inputs.conf try setting the TIME_BEFORE_CLOSE parameter to a higher value. (default is 3)
Another option is to not parse gc logs at all. Instead, use something like SPLUNK4JMX to poll the JVM for info around garbage collection.