I am using Splunk 7.3.0 to ingest JSON Lines where the event timestamp is in ISO 8601 extended format.
In this particular JSON Lines, which is from a proprietary source, the event timestamp is the first timestamp value in each incoming line.
By first, I am referring to the serialized JSON Lines input data, which might arrive in Splunk over a TCP network or from a file. I am aware of the following text in the JSON standard (ECMA-404😞
The JSON syntax ... does not assign any significance to the ordering of name/value pairs. ... [This] may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.
The position of the event timestamp in each line is variable. And the event timestamp is not always associated with the same JSON property name.
Here are two simplified examples of lines of the incoming JSON Lines:
{"code":"abc-123","system":"mysys","tranid":"xyz","start":"2019-10-22T13:00:00.01+08:00","cpu":0.05,"stop":"2019-10-22T13:00:00.02Z"}
{"code":"def-456","collected":"2019-10-22T13:15:00Z","errors":321,"#tran":54321}
In the first line, the event timestamp is the start
property value, which is the fourth property in the line.
In the second line, the event timestamp is the collected
property, which is the second property in the line.
Note that, as shown in these examples, the timestamps might or might not contain fractions of a second.
The timestamp is always within MAX_TIMESTAMP_LOOKAHEAD
.
From my props.conf
:
TIME_PREFIX = (?=\d{4}-\d{2}-\d{2}T)
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%:z
In TIME_PREFIX
, I'm using a lookahead to identify the first occurrence in the line of a string that matches the start of an ISO 8601 extended format timestamp, such as 2019-10-30T...
I could extend the TIME_PREFIX
to include the pattern of the subsequent time component, but I've chosen to limit the amount of regex processing, and quit at the "T" separator. Knowing my data, this is a safe match.
From the Splunk docs topic "How timestamp assignment works"
Most events do not require special timestamp handling. Splunk software automatically recognizes and extracts their timestamps.
In practice, this is true for the events described in this question.
Should I bother specifying TIME_PREFIX
and TIME_FORMAT
? Or should I not bother, and just fall back on Splunk's automatic extraction?
Two reasons I'm bothering, both based on my ignorance of the internals of Splunk's automatic timestamp extraction process (I've looked at datetime.xml
😞
TIME_PREFIX
and TIME_FORMAT
might be more performant. I haven't tested this.