Getting Data In

Why is my production environment breaking logs differently than my lab?

Explorer

Good morning. So I have some TomCat logs of the format below that are parsing correctly in my lab but not in my production environment. The lab is using a completed test/sample log and the production environment logs are using "live" logs.

Both lab/production are running 6.3.1, although I tried 6.4.3. in the lab and it still indexed properly with no change.

2016-08-12 11:50:10,241 INFO [org.jasig.cas.services.DefaultServicesManagerImpl] - Reloading registered services.
2016-08-12 11:50:16,410 INFO [org.jasig.cas.ticket.registry.DefaultTicketRegistry] - Beginning ticket cleanup...
2016-08-12 11:50:16,410 INFO [org.jasig.cas.ticket.registry.DefaultTicketRegistry] - 0 expired tickets found and removed.

2016-08-12 11:51:03,355 INFO [org.jasig.inspektr.audit.support.Slf5jLoggingAuditTrailManager] - Audit trail record BEGIN
=============================================================
WHO: cas9tg11
WHAT: Supplied credentials: [cas9tg11]
ACTION: AUTHENTICATION_SUCCESS
APPLICATION: CAS
WHEN: Fri Aug 12 11:51:03 CDT 2016
CLIENT IP ADDRESS: 192.168.14.38
SERVER IP ADDRESS: cas.idm.mycompany.com
=============================================================

2016-08-12 11:51:05,039 INFO [org.jasig.inspektr.audit.support.Slf5jLoggingAuditTrailManager] - Audit trail record BEGIN
=============================================================
WHO: audit:unknown
WHAT: TGT-57-998bHKceoKhWacmcNYg7o2669zcRaWfgtrhy67eH3EO-cas.idm.mycompany.com
ACTION: TICKET_GRANTING_TICKET_CREATED
APPLICATION: CAS
WHEN: Fri Aug 12 11:51:05 CDT 2016
CLIENT IP ADDRESS: 192.168.14.38
SERVER IP ADDRESS: cas.idm.mycompany.com
=============================================================

My inputs.conf looks like:

[monitor:///usr/share/tomcat8_1/logs/cas-*.log]
disabled = false
sourcetype=cas_audit
index=app_idmcas

However while the multi-line logs look fine in my test lab, in production the multi-line logs split at the "WHEN" line:

e.g.

8/30/16 7:39:49.560 AM
2016-08-30 07:39:49,560 INFO [org.jasig.inspektr.audit.support.Slf5jLoggingAuditTrailManager] - Audit trail record BEGIN
=============================================================
WHO: 12fesm
WHAT: Supplied credentials: [12fesm]
ACTION: AUTHENTICATION_SUCCESS
APPLICATION: CAS

host = idm-cas2.idm.mycompany.com     source = /usr/share/tomcat8_1/logs/cas-2016-08-30-07-1.log    sourcetype = cas_audit

8/30/16 7:39:49.000 AM  
WHEN: Tue Aug 30 07:39:49 CDT 2016
CLIENT IP ADDRESS: 192.168.136.105
SERVER IP ADDRESS: cas.idm.mycompany.com
=============================================================

host = idm-cas2.idm.mycompany.com     source = /usr/share/tomcat8_1/logs/cas-2016-08-30-07-1.log    sourcetype = cas_audit

So I ran the btool command on my indexers to compare props.conf variable settings and they are the same:

/opt/splunk/etc/system/local/props.conf   [cas_audit]
/opt/splunk/etc/system/default/props.conf ANNOTATE_PUNCT = True
/opt/splunk/etc/system/default/props.conf AUTO_KV_JSON = true
/opt/splunk/etc/system/local/props.conf   BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}
/opt/splunk/etc/system/default/props.conf BREAK_ONLY_BEFORE_DATE = True
/opt/splunk/etc/system/default/props.conf CHARSET = AUTO
/opt/splunk/etc/system/default/props.conf DATETIME_CONFIG = /etc/datetime.xml
/opt/splunk/etc/system/default/props.conf HEADER_MODE =
/opt/splunk/etc/system/default/props.conf LEARN_SOURCETYPE = true
/opt/splunk/etc/system/default/props.conf LINE_BREAKER_LOOKBEHIND = 100
/opt/splunk/etc/system/default/props.conf MAX_DAYS_AGO = 2000
/opt/splunk/etc/system/default/props.conf MAX_DAYS_HENCE = 2
/opt/splunk/etc/system/default/props.conf MAX_DIFF_SECS_AGO = 3600
/opt/splunk/etc/system/default/props.conf MAX_DIFF_SECS_HENCE = 604800
/opt/splunk/etc/system/default/props.conf MAX_EVENTS = 256
/opt/splunk/etc/system/local/props.conf   MAX_TIMESTAMP_LOOKAHEAD = 30
/opt/splunk/etc/system/default/props.conf MUST_BREAK_AFTER =
/opt/splunk/etc/system/default/props.conf MUST_NOT_BREAK_AFTER =
/opt/splunk/etc/system/default/props.conf MUST_NOT_BREAK_BEFORE =
/opt/splunk/etc/system/default/props.conf SEGMENTATION = indexing
/opt/splunk/etc/system/default/props.conf SEGMENTATION-all = full
/opt/splunk/etc/system/default/props.conf SEGMENTATION-inner = inner
/opt/splunk/etc/system/default/props.conf SEGMENTATION-outer = outer
/opt/splunk/etc/system/default/props.conf SEGMENTATION-raw = none
/opt/splunk/etc/system/default/props.conf SEGMENTATION-standard = standard
/opt/splunk/etc/system/default/props.conf SHOULD_LINEMERGE = True
/opt/splunk/etc/system/default/props.conf TRANSFORMS =
/opt/splunk/etc/system/default/props.conf TRUNCATE = 10000
/opt/splunk/etc/system/default/props.conf detect_trailing_nulls = auto
/opt/splunk/etc/system/default/props.conf maxDist = 100
/opt/splunk/etc/system/default/props.conf priority =
/opt/splunk/etc/system/default/props.conf sourcetype =

I've tried various other settings for "BREAK_ONLY_BEFORE" and changed "BREAK_ONLY_BEFORE_DATE = false" and adjusted the "MAX_TIMESTAMP_LOOKAHEAD = 128", none of these changed seemed to affect anything.

Any ideas what I could try next? Thank you in advance for any assistance.

SplunkTrust
SplunkTrust

Usually when a 'live' tail of a file causes different line breaking than reading the exact same file "in bulk" I lean toward the TIME_BEFORE_CLOSE option in inputs.conf. You might try increasing that in production and see if that makes any difference. Sometimes Splunk can see the delay from buffered writes as "you've finished writing, let me process what I have". Increasing TIME_BEFORE_CLOSE can help reduce that inclination somewhat.