Getting Data In

Which timestamp does indexer use as _time?

yuanliu
SplunkTrust
SplunkTrust

When multiple timestamps exist in raw events, which one does the indexer pick as _time?  In the majority of conditions, Splunk picks the one that I would have most preferred even though I am unable to give it preference.  How is the decision made?

  1. In file ingestion, I can explicitly specify "TIMESTAMP_FIELDS".  If multiple is present, this means that Splunk has to pick one of them.
  2. In file monitoring, multiple fields may contain a timestamp.  Even with structured input such as CSV, I notice that the field name may not have a direct impact on which field is ultimately chosen. (I was once surprised that a field containing a text string concatenated with a numeric value that falls into the current epoch range, the numeric part was used as _time.  That was one of the rare obvious "wrong" choices indexer made that I have noticed.)

My most recent (pleasant surprise) experience was with a JSON API source that comes with several timestamp fields that may or may not be populated, so I also had to forcefully add my own timestamp.  I gave my field the name "timestamp" because I thought it would be best to just use this because in some cases, the other timestamp fields could be really stale, although I wouldn't mind if one of the "fresher" timestamps were used; in fact, I would prefer that a fresh timestamp from original data be used.

Rather strangely, if I do not add this retrieval "timestamp", indexer doesn't populate _time - which is bad.  But after I add my "timestamp" (somewhat reluctantly), the indexer picks my "timestamp" field if all other timestamp fields are either stale or blank, but ignores my (artificial) "timestamp" field, and pick a "fresh" timestamp from the original source as _time.  This is kind of optimal for me.

In the files that I produce from this API, there is no indication that "timestamp" is "artificial".  What is the criteria that Splunk uses to make a determination that one of the original timestamps is "fresh" or "stale", and that my "timestamp" field could be "too fresh"?

Adding to my befuddlement, I add the same "timestamp" field on a different API (also JSON), except this time, indexer is not returning any _time at all.

If, on the other hand, I do not populate my own "timestamp" field, indexer adds a "timestamp" field to the result, except the value is universally "none".  If I cheat by setting a field named "_time", the indexer populates a field "time" with that value.

At this point, I am at a deadend with this "other" API.

To help me think, I construct this diagnostic matrix.

 API 1API 2
Several original timestamp fields, but no faked "timestamp" or "_time"No _time=
Fake "timestamp"_time populated with desirable selection between original timestamps and faked "timestamp"No _time, just "timestamp"
Fake "_time"(not tested)No _time, populates "time" instead.

In all cases, my fake time fields are in fractional epoch, while original timestamp fields are in text format.  Both sourcetypes do not  have "TIMESTAMP_FIELDS" set.

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

I have partial (a large part) answer now: Something to do with sourcetype's implicit MAX_TIMESTAMP_LOOKAHEAD property.  This property defaults 128 and, unless you change it, it won't show in props.conf's sourcetype stanza, or in the GUI's Advanced view.


Both sourcetypes do not  have "TIMESTAMP_FIELDS" set.

What is left unsaid is INDEXED_EXTRACTIONS.  In both cases, I tested json and none.  With  INDEXED_EXTRACTIONS=json, I can specify TIMESTAMP_FIELDS but I didn't.  (You can say I really like to examine how automatic extraction works.)

API 1 happens to be placing a possible timestamp field before the 128 mark, while API 2's first timestamp field comes after.  My fake timestamp field (however I name it) comes at the end.  I can either give MAX_TIMESTAMP_LOOKAHEAD a large  enough number, alternatively, use TIME_PREFIX or, just use INDEXED_EXTRACTIONS=json  and set TIMESTAMP_FIELDS so files from API 2 will be timestamped correctly.

It is interesting to know that MAX_TIMESTAMP_LOOKAHEAD is still effective when INDEXED_EXTRACTIONS=json (in the absence of TIMESTAMP_FIELDS).

I still do not know

  1. why API 1 won't auto extract without a fake "timestamp" field way beyond the 128 mark, and
  2. why, with fake "timestamp" appended to the end, when the event's first timestamp contains null value, the  indexer seeks my fake "timestamp". (When all possible event timestamp fields are populated and relatively fresh, it sometimes picked another field.  All without an explicit MAX_TIMESTAMP_LOOKAHEAD, i.e., the value would be 128.)

 

View solution in original post

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I have partial (a large part) answer now: Something to do with sourcetype's implicit MAX_TIMESTAMP_LOOKAHEAD property.  This property defaults 128 and, unless you change it, it won't show in props.conf's sourcetype stanza, or in the GUI's Advanced view.


Both sourcetypes do not  have "TIMESTAMP_FIELDS" set.

What is left unsaid is INDEXED_EXTRACTIONS.  In both cases, I tested json and none.  With  INDEXED_EXTRACTIONS=json, I can specify TIMESTAMP_FIELDS but I didn't.  (You can say I really like to examine how automatic extraction works.)

API 1 happens to be placing a possible timestamp field before the 128 mark, while API 2's first timestamp field comes after.  My fake timestamp field (however I name it) comes at the end.  I can either give MAX_TIMESTAMP_LOOKAHEAD a large  enough number, alternatively, use TIME_PREFIX or, just use INDEXED_EXTRACTIONS=json  and set TIMESTAMP_FIELDS so files from API 2 will be timestamped correctly.

It is interesting to know that MAX_TIMESTAMP_LOOKAHEAD is still effective when INDEXED_EXTRACTIONS=json (in the absence of TIMESTAMP_FIELDS).

I still do not know

  1. why API 1 won't auto extract without a fake "timestamp" field way beyond the 128 mark, and
  2. why, with fake "timestamp" appended to the end, when the event's first timestamp contains null value, the  indexer seeks my fake "timestamp". (When all possible event timestamp fields are populated and relatively fresh, it sometimes picked another field.  All without an explicit MAX_TIMESTAMP_LOOKAHEAD, i.e., the value would be 128.)

 

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...