When multiple timestamps exist in raw events, which one does the indexer pick as _time? In the majority of conditions, Splunk picks the one that I would have most preferred even though I am unable to give it preference. How is the decision made?
My most recent (pleasant surprise) experience was with a JSON API source that comes with several timestamp fields that may or may not be populated, so I also had to forcefully add my own timestamp. I gave my field the name "timestamp" because I thought it would be best to just use this because in some cases, the other timestamp fields could be really stale, although I wouldn't mind if one of the "fresher" timestamps were used; in fact, I would prefer that a fresh timestamp from original data be used.
Rather strangely, if I do not add this retrieval "timestamp", indexer doesn't populate _time - which is bad. But after I add my "timestamp" (somewhat reluctantly), the indexer picks my "timestamp" field if all other timestamp fields are either stale or blank, but ignores my (artificial) "timestamp" field, and pick a "fresh" timestamp from the original source as _time. This is kind of optimal for me.
In the files that I produce from this API, there is no indication that "timestamp" is "artificial". What is the criteria that Splunk uses to make a determination that one of the original timestamps is "fresh" or "stale", and that my "timestamp" field could be "too fresh"?
Adding to my befuddlement, I add the same "timestamp" field on a different API (also JSON), except this time, indexer is not returning any _time at all.
If, on the other hand, I do not populate my own "timestamp" field, indexer adds a "timestamp" field to the result, except the value is universally "none". If I cheat by setting a field named "_time", the indexer populates a field "time" with that value.
At this point, I am at a deadend with this "other" API.
To help me think, I construct this diagnostic matrix.
API 1 | API 2 | |
Several original timestamp fields, but no faked "timestamp" or "_time" | No _time | = |
Fake "timestamp" | _time populated with desirable selection between original timestamps and faked "timestamp" | No _time, just "timestamp" |
Fake "_time" | (not tested) | No _time, populates "time" instead. |
In all cases, my fake time fields are in fractional epoch, while original timestamp fields are in text format. Both sourcetypes do not have "TIMESTAMP_FIELDS" set.
I have partial (a large part) answer now: Something to do with sourcetype's implicit MAX_TIMESTAMP_LOOKAHEAD property. This property defaults 128 and, unless you change it, it won't show in props.conf's sourcetype stanza, or in the GUI's Advanced view.
Both sourcetypes do not have "TIMESTAMP_FIELDS" set.
What is left unsaid is INDEXED_EXTRACTIONS. In both cases, I tested json and none. With INDEXED_EXTRACTIONS=json, I can specify TIMESTAMP_FIELDS but I didn't. (You can say I really like to examine how automatic extraction works.)
API 1 happens to be placing a possible timestamp field before the 128 mark, while API 2's first timestamp field comes after. My fake timestamp field (however I name it) comes at the end. I can either give MAX_TIMESTAMP_LOOKAHEAD a large enough number, alternatively, use TIME_PREFIX or, just use INDEXED_EXTRACTIONS=json and set TIMESTAMP_FIELDS so files from API 2 will be timestamped correctly.
It is interesting to know that MAX_TIMESTAMP_LOOKAHEAD is still effective when INDEXED_EXTRACTIONS=json (in the absence of TIMESTAMP_FIELDS).
I still do not know
I have partial (a large part) answer now: Something to do with sourcetype's implicit MAX_TIMESTAMP_LOOKAHEAD property. This property defaults 128 and, unless you change it, it won't show in props.conf's sourcetype stanza, or in the GUI's Advanced view.
Both sourcetypes do not have "TIMESTAMP_FIELDS" set.
What is left unsaid is INDEXED_EXTRACTIONS. In both cases, I tested json and none. With INDEXED_EXTRACTIONS=json, I can specify TIMESTAMP_FIELDS but I didn't. (You can say I really like to examine how automatic extraction works.)
API 1 happens to be placing a possible timestamp field before the 128 mark, while API 2's first timestamp field comes after. My fake timestamp field (however I name it) comes at the end. I can either give MAX_TIMESTAMP_LOOKAHEAD a large enough number, alternatively, use TIME_PREFIX or, just use INDEXED_EXTRACTIONS=json and set TIMESTAMP_FIELDS so files from API 2 will be timestamped correctly.
It is interesting to know that MAX_TIMESTAMP_LOOKAHEAD is still effective when INDEXED_EXTRACTIONS=json (in the absence of TIMESTAMP_FIELDS).
I still do not know