Configured vs automatic extraction for timestamps in ISO 8601 extended format?

Graham_Hanningt — Wed, 30 Oct 2019 07:36:55 GMT

Background to this question

I am using Splunk 7.3.0 to ingest JSON Lines where the event timestamp is in ISO 8601 extended format.

In this particular JSON Lines, which is from a proprietary source, the event timestamp is the first timestamp value in each incoming line.

By first, I am referring to the serialized JSON Lines input data, which might arrive in Splunk over a TCP network or from a file. I am aware of the following text in the JSON standard (ECMA-404😞

The JSON syntax ... does not assign any significance to the ordering of name/value pairs. ... [This] may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

The position of the event timestamp in each line is variable. And the event timestamp is not always associated with the same JSON property name.

Here are two simplified examples of lines of the incoming JSON Lines:

{"code":"abc-123","system":"mysys","tranid":"xyz","start":"2019-10-22T13:00:00.01+08:00","cpu":0.05,"stop":"2019-10-22T13:00:00.02Z"}
{"code":"def-456","collected":"2019-10-22T13:15:00Z","errors":321,"#tran":54321}

In the first line, the event timestamp is the start property value, which is the fourth property in the line.

In the second line, the event timestamp is the collected property, which is the second property in the line.

Note that, as shown in these examples, the timestamps might or might not contain fractions of a second.

The timestamp is always within MAX_TIMESTAMP_LOOKAHEAD.

Configured timestamp extraction

From my props.conf:

TIME_PREFIX = (?=\d{4}-\d{2}-\d{2}T)
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%:z

In TIME_PREFIX, I'm using a lookahead to identify the first occurrence in the line of a string that matches the start of an ISO 8601 extended format timestamp, such as 2019-10-30T...

I could extend the TIME_PREFIX to include the pattern of the subsequent time component, but I've chosen to limit the amount of regex processing, and quit at the "T" separator. Knowing my data, this is a safe match.

Automatic timestamp extraction

From the Splunk docs topic "How timestamp assignment works"

Most events do not require special timestamp handling. Splunk software automatically recognizes and extracts their timestamps.

In practice, this is true for the events described in this question.

Finally, the question

Should I bother specifying TIME_PREFIX and TIME_FORMAT? Or should I not bother, and just fall back on Splunk's automatic extraction?

Two reasons I'm bothering, both based on my ignorance of the internals of Splunk's automatic timestamp extraction process (I've looked at datetime.xml😞

There might be a chance that Splunk's automatic timestamp extraction "gets it wrong", depending on what values precede the ISO 8601-format event timestamp.
Specifying TIME_PREFIX and TIME_FORMAT might be more performant. I haven't tested this.

topic Configured vs automatic extraction for timestamps in ISO 8601 extended format? in Getting Data In

Configured vs automatic extraction for timestamps in ISO 8601 extended format?

Background to this question

Configured timestamp extraction

Automatic timestamp extraction

Finally, the question