<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Configured vs automatic extraction for timestamps in ISO 8601 extended format? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Configured-vs-automatic-extraction-for-timestamps-in-ISO-8601/m-p/467991#M80582</link>
    <description>&lt;H2&gt;Background to this question&lt;/H2&gt;

&lt;P&gt;I am using Splunk 7.3.0 to ingest JSON Lines where the event timestamp is in ISO 8601 extended format.&lt;/P&gt;

&lt;P&gt;In this particular JSON Lines, which is from a proprietary source, the event timestamp is the &lt;EM&gt;first&lt;/EM&gt; timestamp value in each incoming line.&lt;/P&gt;

&lt;P&gt;By &lt;EM&gt;first&lt;/EM&gt;, I am referring to the &lt;EM&gt;serialized&lt;/EM&gt; JSON Lines input data, which might arrive in Splunk over a TCP network or from a file. I am aware of the following text in the JSON standard (&lt;A href="http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf"&gt;ECMA-404&lt;/A&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;The JSON syntax ... does not assign any significance to the ordering of name/value pairs. ... [This] may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;The position of the event timestamp in each line is variable. And the event timestamp is not always associated with the same JSON property name.&lt;/P&gt;

&lt;P&gt;Here are two simplified examples of lines of the incoming JSON Lines:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"code":"abc-123","system":"mysys","tranid":"xyz","start":"2019-10-22T13:00:00.01+08:00","cpu":0.05,"stop":"2019-10-22T13:00:00.02Z"}
{"code":"def-456","collected":"2019-10-22T13:15:00Z","errors":321,"#tran":54321}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In the first line, the event timestamp is the &lt;CODE&gt;start&lt;/CODE&gt; property value, which is the fourth property in the line.&lt;/P&gt;

&lt;P&gt;In the second line, the event timestamp is the &lt;CODE&gt;collected&lt;/CODE&gt; property, which is the second property in the line.&lt;/P&gt;

&lt;P&gt;Note that, as shown in these examples, the timestamps might or might not contain fractions of a second.&lt;/P&gt;

&lt;P&gt;The timestamp is always within  &lt;CODE&gt;MAX_TIMESTAMP_LOOKAHEAD&lt;/CODE&gt;.&lt;/P&gt;

&lt;H3&gt;Configured timestamp extraction&lt;/H3&gt;

&lt;P&gt;From my &lt;CODE&gt;props.conf&lt;/CODE&gt;:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;TIME_PREFIX = (?=\d{4}-\d{2}-\d{2}T)
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%:z
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt;, I'm using a lookahead to identify the first occurrence in the line of a string that matches the start of an ISO 8601 extended format timestamp, such as 2019-10-30T...&lt;/P&gt;

&lt;P&gt;I could extend the &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; to include the pattern of the subsequent time component, but I've chosen to limit the amount of regex processing, and quit at the "T" separator. Knowing my data, this is a safe match.&lt;/P&gt;

&lt;H3&gt;Automatic timestamp extraction&lt;/H3&gt;

&lt;P&gt;From the Splunk docs topic "&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.3.0/Data/HowSplunkextractstimestamps"&gt;How timestamp assignment works&lt;/A&gt;"&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;Most events do not require special timestamp handling. Splunk software automatically recognizes and extracts their timestamps.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;In practice, this is true for the events described in this question.&lt;/P&gt;

&lt;H2&gt;Finally, the question&lt;/H2&gt;

&lt;P&gt;&lt;STRONG&gt;Should I bother specifying &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; and &lt;CODE&gt;TIME_FORMAT&lt;/CODE&gt;?&lt;/STRONG&gt; Or should I not bother, and just fall back on Splunk's automatic extraction?&lt;/P&gt;

&lt;P&gt;Two reasons I'm bothering, both based on my ignorance of the internals of Splunk's automatic timestamp extraction process (I've looked at &lt;CODE&gt;datetime.xml&lt;/CODE&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;There might be a chance that Splunk's automatic timestamp extraction "gets it wrong", depending on what values precede the ISO 8601-format event timestamp.&lt;/LI&gt;
&lt;LI&gt;Specifying &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; and &lt;CODE&gt;TIME_FORMAT&lt;/CODE&gt; might be more performant. I haven't tested this.&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Wed, 30 Oct 2019 07:36:55 GMT</pubDate>
    <dc:creator>Graham_Hanningt</dc:creator>
    <dc:date>2019-10-30T07:36:55Z</dc:date>
    <item>
      <title>Configured vs automatic extraction for timestamps in ISO 8601 extended format?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Configured-vs-automatic-extraction-for-timestamps-in-ISO-8601/m-p/467991#M80582</link>
      <description>&lt;H2&gt;Background to this question&lt;/H2&gt;

&lt;P&gt;I am using Splunk 7.3.0 to ingest JSON Lines where the event timestamp is in ISO 8601 extended format.&lt;/P&gt;

&lt;P&gt;In this particular JSON Lines, which is from a proprietary source, the event timestamp is the &lt;EM&gt;first&lt;/EM&gt; timestamp value in each incoming line.&lt;/P&gt;

&lt;P&gt;By &lt;EM&gt;first&lt;/EM&gt;, I am referring to the &lt;EM&gt;serialized&lt;/EM&gt; JSON Lines input data, which might arrive in Splunk over a TCP network or from a file. I am aware of the following text in the JSON standard (&lt;A href="http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf"&gt;ECMA-404&lt;/A&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;The JSON syntax ... does not assign any significance to the ordering of name/value pairs. ... [This] may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;The position of the event timestamp in each line is variable. And the event timestamp is not always associated with the same JSON property name.&lt;/P&gt;

&lt;P&gt;Here are two simplified examples of lines of the incoming JSON Lines:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"code":"abc-123","system":"mysys","tranid":"xyz","start":"2019-10-22T13:00:00.01+08:00","cpu":0.05,"stop":"2019-10-22T13:00:00.02Z"}
{"code":"def-456","collected":"2019-10-22T13:15:00Z","errors":321,"#tran":54321}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In the first line, the event timestamp is the &lt;CODE&gt;start&lt;/CODE&gt; property value, which is the fourth property in the line.&lt;/P&gt;

&lt;P&gt;In the second line, the event timestamp is the &lt;CODE&gt;collected&lt;/CODE&gt; property, which is the second property in the line.&lt;/P&gt;

&lt;P&gt;Note that, as shown in these examples, the timestamps might or might not contain fractions of a second.&lt;/P&gt;

&lt;P&gt;The timestamp is always within  &lt;CODE&gt;MAX_TIMESTAMP_LOOKAHEAD&lt;/CODE&gt;.&lt;/P&gt;

&lt;H3&gt;Configured timestamp extraction&lt;/H3&gt;

&lt;P&gt;From my &lt;CODE&gt;props.conf&lt;/CODE&gt;:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;TIME_PREFIX = (?=\d{4}-\d{2}-\d{2}T)
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%:z
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt;, I'm using a lookahead to identify the first occurrence in the line of a string that matches the start of an ISO 8601 extended format timestamp, such as 2019-10-30T...&lt;/P&gt;

&lt;P&gt;I could extend the &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; to include the pattern of the subsequent time component, but I've chosen to limit the amount of regex processing, and quit at the "T" separator. Knowing my data, this is a safe match.&lt;/P&gt;

&lt;H3&gt;Automatic timestamp extraction&lt;/H3&gt;

&lt;P&gt;From the Splunk docs topic "&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.3.0/Data/HowSplunkextractstimestamps"&gt;How timestamp assignment works&lt;/A&gt;"&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;Most events do not require special timestamp handling. Splunk software automatically recognizes and extracts their timestamps.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;In practice, this is true for the events described in this question.&lt;/P&gt;

&lt;H2&gt;Finally, the question&lt;/H2&gt;

&lt;P&gt;&lt;STRONG&gt;Should I bother specifying &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; and &lt;CODE&gt;TIME_FORMAT&lt;/CODE&gt;?&lt;/STRONG&gt; Or should I not bother, and just fall back on Splunk's automatic extraction?&lt;/P&gt;

&lt;P&gt;Two reasons I'm bothering, both based on my ignorance of the internals of Splunk's automatic timestamp extraction process (I've looked at &lt;CODE&gt;datetime.xml&lt;/CODE&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;There might be a chance that Splunk's automatic timestamp extraction "gets it wrong", depending on what values precede the ISO 8601-format event timestamp.&lt;/LI&gt;
&lt;LI&gt;Specifying &lt;CODE&gt;TIME_PREFIX&lt;/CODE&gt; and &lt;CODE&gt;TIME_FORMAT&lt;/CODE&gt; might be more performant. I haven't tested this.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 30 Oct 2019 07:36:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Configured-vs-automatic-extraction-for-timestamps-in-ISO-8601/m-p/467991#M80582</guid>
      <dc:creator>Graham_Hanningt</dc:creator>
      <dc:date>2019-10-30T07:36:55Z</dc:date>
    </item>
  </channel>
</rss>

