<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Which timestamp does indexer use as _time? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Which-timestamp-does-indexer-use-as-time/m-p/556423#M92102</link>
    <description>&lt;P&gt;When multiple timestamps exist in raw events, which one does the indexer pick as _time? &amp;nbsp;In the majority of conditions, Splunk picks the one that I would have most preferred even though I am unable to give it preference. &amp;nbsp;How is the decision made?&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;In file ingestion, I can explicitly specify "TIMESTAMP_FIELDS". &amp;nbsp;If multiple is present, this means that Splunk has to pick one of them.&lt;/LI&gt;&lt;LI&gt;In file monitoring, multiple fields may contain a timestamp. &amp;nbsp;Even with structured input such as CSV, I notice that the field name may not have a direct impact on which field is ultimately chosen. (I was once surprised that a field containing a text string concatenated with a numeric value that falls into the current epoch range, the numeric part was used as _time. &amp;nbsp;That was one of the rare obvious "wrong" choices indexer made that I have noticed.)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;My most recent (pleasant surprise) experience was with a JSON API source that comes with several timestamp fields that may or may not be populated, so I also had to forcefully add my own timestamp. &amp;nbsp;I gave my field the name "timestamp" because I thought it would be best to just use this because in some cases, the other timestamp fields could be really stale, although I wouldn't mind if one of the "fresher" timestamps were used; in fact, I would prefer that a fresh timestamp from original data be used.&lt;/P&gt;&lt;P&gt;Rather strangely, if I do not add this retrieval "timestamp", indexer doesn't populate _time - which is bad. &amp;nbsp;But after I add my "timestamp" (somewhat reluctantly), the indexer picks my "timestamp" field if all other timestamp fields are either stale or blank, but ignores my (artificial) "timestamp" field, and pick a "fresh" timestamp from the original source as _time. &amp;nbsp;This is kind of optimal for me.&lt;/P&gt;&lt;P&gt;In the files that I produce from this API, there is no indication that "timestamp" is "artificial". &amp;nbsp;What is the criteria that Splunk uses to make a determination that one of the original timestamps is "fresh" or "stale", and that my "timestamp" field could be "too fresh"?&lt;/P&gt;&lt;P&gt;Adding to my befuddlement, I add the same "timestamp" field on a different API (also JSON), except this time, indexer is not returning any _time at all.&lt;/P&gt;&lt;P&gt;If, on the other hand, I do not populate my own "timestamp" field, indexer adds a "timestamp" field to the result, except the value is universally "none". &amp;nbsp;If I cheat by setting a field named "_time", the indexer populates a field "time" with that value.&lt;/P&gt;&lt;P&gt;At this point, I am at a deadend with this "other" API.&lt;/P&gt;&lt;P&gt;To help me think, I construct this diagnostic matrix.&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;API 1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;API 2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;Several original timestamp fields, but no faked "timestamp" or "_time"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;No _time&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;=&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;Fake "timestamp"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;_time populated with desirable selection between original timestamps and faked "timestamp"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;No _time, just "timestamp"&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;Fake "_time"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;(not tested)&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;No _time, populates "time" instead.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;In all cases, my fake time fields are in fractional epoch, while original timestamp fields are in text format. &amp;nbsp;Both sourcetypes do &lt;EM&gt;not&lt;/EM&gt; &amp;nbsp;have&amp;nbsp;"TIMESTAMP_FIELDS" set.&lt;/P&gt;</description>
    <pubDate>Sun, 20 Jun 2021 21:57:42 GMT</pubDate>
    <dc:creator>yuanliu</dc:creator>
    <dc:date>2021-06-20T21:57:42Z</dc:date>
    <item>
      <title>Which timestamp does indexer use as _time?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Which-timestamp-does-indexer-use-as-time/m-p/556423#M92102</link>
      <description>&lt;P&gt;When multiple timestamps exist in raw events, which one does the indexer pick as _time? &amp;nbsp;In the majority of conditions, Splunk picks the one that I would have most preferred even though I am unable to give it preference. &amp;nbsp;How is the decision made?&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;In file ingestion, I can explicitly specify "TIMESTAMP_FIELDS". &amp;nbsp;If multiple is present, this means that Splunk has to pick one of them.&lt;/LI&gt;&lt;LI&gt;In file monitoring, multiple fields may contain a timestamp. &amp;nbsp;Even with structured input such as CSV, I notice that the field name may not have a direct impact on which field is ultimately chosen. (I was once surprised that a field containing a text string concatenated with a numeric value that falls into the current epoch range, the numeric part was used as _time. &amp;nbsp;That was one of the rare obvious "wrong" choices indexer made that I have noticed.)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;My most recent (pleasant surprise) experience was with a JSON API source that comes with several timestamp fields that may or may not be populated, so I also had to forcefully add my own timestamp. &amp;nbsp;I gave my field the name "timestamp" because I thought it would be best to just use this because in some cases, the other timestamp fields could be really stale, although I wouldn't mind if one of the "fresher" timestamps were used; in fact, I would prefer that a fresh timestamp from original data be used.&lt;/P&gt;&lt;P&gt;Rather strangely, if I do not add this retrieval "timestamp", indexer doesn't populate _time - which is bad. &amp;nbsp;But after I add my "timestamp" (somewhat reluctantly), the indexer picks my "timestamp" field if all other timestamp fields are either stale or blank, but ignores my (artificial) "timestamp" field, and pick a "fresh" timestamp from the original source as _time. &amp;nbsp;This is kind of optimal for me.&lt;/P&gt;&lt;P&gt;In the files that I produce from this API, there is no indication that "timestamp" is "artificial". &amp;nbsp;What is the criteria that Splunk uses to make a determination that one of the original timestamps is "fresh" or "stale", and that my "timestamp" field could be "too fresh"?&lt;/P&gt;&lt;P&gt;Adding to my befuddlement, I add the same "timestamp" field on a different API (also JSON), except this time, indexer is not returning any _time at all.&lt;/P&gt;&lt;P&gt;If, on the other hand, I do not populate my own "timestamp" field, indexer adds a "timestamp" field to the result, except the value is universally "none". &amp;nbsp;If I cheat by setting a field named "_time", the indexer populates a field "time" with that value.&lt;/P&gt;&lt;P&gt;At this point, I am at a deadend with this "other" API.&lt;/P&gt;&lt;P&gt;To help me think, I construct this diagnostic matrix.&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;API 1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;API 2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;Several original timestamp fields, but no faked "timestamp" or "_time"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;No _time&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;=&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;Fake "timestamp"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;_time populated with desirable selection between original timestamps and faked "timestamp"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;No _time, just "timestamp"&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;Fake "_time"&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;(not tested)&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="25px"&gt;No _time, populates "time" instead.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;In all cases, my fake time fields are in fractional epoch, while original timestamp fields are in text format. &amp;nbsp;Both sourcetypes do &lt;EM&gt;not&lt;/EM&gt; &amp;nbsp;have&amp;nbsp;"TIMESTAMP_FIELDS" set.&lt;/P&gt;</description>
      <pubDate>Sun, 20 Jun 2021 21:57:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Which-timestamp-does-indexer-use-as-time/m-p/556423#M92102</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2021-06-20T21:57:42Z</dc:date>
    </item>
    <item>
      <title>Re: Which timestamp does indexer use as _time?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Which-timestamp-does-indexer-use-as-time/m-p/556430#M92105</link>
      <description>&lt;P&gt;I have partial (a large part) answer now: Something to do with sourcetype's &lt;EM&gt;implicit&lt;/EM&gt; MAX_TIMESTAMP_LOOKAHEAD property. &amp;nbsp;This property defaults 128 and, unless you change it, it won't show in props.conf's sourcetype stanza, or in the GUI's Advanced view.&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;Both sourcetypes do &lt;EM&gt;not&lt;/EM&gt; &amp;nbsp;have&amp;nbsp;"TIMESTAMP_FIELDS" set.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;What is left unsaid is&amp;nbsp;INDEXED_EXTRACTIONS. &amp;nbsp;In both cases, I tested json and none. &amp;nbsp;With &amp;nbsp;INDEXED_EXTRACTIONS=json, I can specify&amp;nbsp;TIMESTAMP_FIELDS but I didn't. &amp;nbsp;(You can say I really like to examine how automatic extraction works.)&lt;/P&gt;&lt;P&gt;API 1 happens to be placing a possible timestamp field before the 128 mark, while API 2's first timestamp field comes after. &amp;nbsp;My fake timestamp field (however I name it) comes at the end. &amp;nbsp;I can either give&amp;nbsp;MAX_TIMESTAMP_LOOKAHEAD a large &amp;nbsp;enough number, alternatively, use TIME_PREFIX or, just use&amp;nbsp;INDEXED_EXTRACTIONS=json &amp;nbsp;and set&amp;nbsp;TIMESTAMP_FIELDS so files from API 2 will be timestamped correctly.&lt;/P&gt;&lt;P&gt;It is interesting to know that&amp;nbsp;MAX_TIMESTAMP_LOOKAHEAD is still effective when&amp;nbsp;INDEXED_EXTRACTIONS=json (in the absence of&amp;nbsp;TIMESTAMP_FIELDS).&lt;/P&gt;&lt;P&gt;I still do not know&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;why API 1 won't auto extract without a fake "timestamp" field way beyond the 128 mark, and&lt;/LI&gt;&lt;LI&gt;why, with fake "timestamp" appended to the end, when the event's first timestamp contains null value, the &amp;nbsp;indexer seeks my fake "timestamp". (When all possible event timestamp fields are populated and relatively fresh, it sometimes picked another field. &amp;nbsp;All without an explicit&amp;nbsp;MAX_TIMESTAMP_LOOKAHEAD, i.e., the value would be 128.)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 20 Jun 2021 23:16:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Which-timestamp-does-indexer-use-as-time/m-p/556430#M92105</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2021-06-20T23:16:18Z</dc:date>
    </item>
  </channel>
</rss>

