Splunk (version 4.2.5) interprets timestamps that are embedded in an event as though it is a separate event. The target event is then effectively broken up (with the first "piece" terminating at the last line prior to the embedded time stamp) and another event is indexed for each subsequent embedded time stamp.
The event entry in the native log file looks like this:
2014-11-24 21:13:26.991 EventText EventText EventText EventText EventText EventText EventText
-xml-
-xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml
xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml-
-timeCreated-2014-11-24T21:13:26.914Z-/timeCreated-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-start-2014-11-24T21:21:00Z-/start-
-end-2014-11-24T21:21:00Z-/end-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
etc...
Splunk reacts to this in a few different ways,
A: When splunk indexes the “single” event above, it actually creates 3 events.
2014-11-24 21:13:26.991 EventText EventText EventText EventText EventText EventText EventText
-xml-
-xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml
xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml-
-timeCreated-2014-11-24T21:13:26.914Z-/timeCreated-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
Note that splunk actually does “look past” the second time tag and simply includes it as part of the message (this is the behavior we want for all subsequent embedded time stamps) but this event ends when splunk hits the next embedded time tag.
-start-2014-11-24T21:21:00Z-/start-
-end-2014-11-24T21:21:00Z-/end-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
etc...
B. sometimes Splunk creates just 2 events, then indicates “n lines omitted”
2014-11-24 21:13:26.991 EventText EventText EventText EventText EventText EventText EventText
-xml-
-xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml
xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml-
-timeCreated-2014-11-24T21:13:26.914Z-/timeCreated-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-start-2014-11-24T21:21:00Z-/start-
300 lines omitted….
Most of the events in the native log are typical events with a single event time stamp and some message text. Some events (like this one), have embedded xml, but no additional time stamps within the xml….these events are all indexed appropriately.
The events like the one here are few, but we need a way to tell splunk to “look beyond” other embedded time stamps when indexing events from this log without interfering with the proper indexing of other “normal events”
Thank you so much for any assistance offered.
MichaelS
You would need to configure the timestamp recognition and event breaking configuration explicitly in props.conf On Indexers, so that Splunk would not parse the events with its default settings.
Based on the sample entries in the question, this should work
props.conf (on Indexer)
[YourSourceType]
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2}
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%3q
TIME_PREFIX=^
I wouldn't recommend using that BREAK_ONLY_BEFORE
expression. That breaks any time it finds something looking a bit like a date at the beginning of a line - that's too weak, it'd be better to remove that setting and let Splunk break on timestamps. That takes the entire TIME_FORMAT
into consideration without having to specify the format twice.
Thank you so much somesoni2 and mmueller.
so to be clear the props.conf entry should look like this with the recommendation from somesoni1 as amended by recommendation from mmueller?
[YourSourceType]
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%3q
TIME_PREFIX=^
Thanks all. Almost there but can you help me with one more iteration?
I've added the following to my props.conf per your recommendations and I'm now properly indexing the target event will all included lines. BUT, when the log contains other event entries with the EXACT same timestamp, splunk is not indexing them (they do show up if I select "show source" - but the events are not indexed:
[YourSourceType]
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%3q
TIME_PREFIX=^
so in the raw log I have:
2014-12-10 03:31:14.843 TRACE [Thread-7] ...EventTxt
2014-12-10 03:31:14.844 TRACE [Thread-7] ...EventText
2014-12-10 03:31:14.844 TRACE [Thread-7] ...EventText- - LOONG XML with embedded time stamps -
2014-12-10 03:31:14.844 DEBUG [Thread-7] ...EventText
2014-12-10 03:31:14.844 TRACE [pool-2-thread-2] ...EventText
2014-12-10 03:31:14.844 TRACE [pool-2-thread-2] ...EventText
2014-12-10 03:31:14.846 TRACE [pool-2-thread-2] ...EventText
but splunk indexes ONLY
2014-12-10 03:31:14.843 TRACE [Thread-7] ...EventTxt
2014-12-10 03:31:14.844 TRACE [Thread-7] ...EventText- - LOONG XML with embedded time stamps -
2014-12-10 03:31:14.846 TRACE [pool-2-thread-2] ...EventText
Thanks so much for you continued help
That looks correct.