Getting Data In

TIME_FORMAT strptime bug for %s: mitigation with non-conversion-specification characters?

woodcock
Esteemed Legend

According to this:

http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html

Which is referenced from here:

http://docs.splunk.com/Documentation/Splunk/6.1.3/Data/Configuretimestamprecognition:

This is how %S is supposed to work:

%S The seconds [00,60]; leading zeros are permitted but not required.

I suppose it is debatable as to how open-ended "leading zeros" should be but clearly the context is "for values in the set from 0-9, which are the only possible values in the set 0-60 which can have leading zeros.

What splunk actually does is allow for any number of leading zeros which is causing me problems because of my particular time specification which uses percent-encoding for non-alphanumeric characters and looks like this:

1401020304052D0500

The encoding is YYMMDDhhmmssTZ so the above value, decoded is:

January 2, 2014 at 3:04:05 -05:00

My problematic props.conf configuration is this:

TIME_FORMAT = %y%m%d%H%M%S

The problem is that for the above example, instead of ONLY assessing 2 characters, Splunk erroneously consumes leading zeros until it runs out of digits or consuming the next digit would grow the number above 60. In other words, after correctly parsing "1401020304" as "January 2, 2014 at 3:04:", instead of parsing "052D0500" as "05"+"2D0500" getting ss="05", it erroneously parses "052"+"D0500" getting ss="52".

I have given up on being able to process the timezone (even with datetime.xml) but since all our events are in the same timezone and it never changes, I can just use:

TZ = US/Pacific

So my only problem left is how do I make Splunk parse the seconds the way it should. If I could use REGEX, I could use a list of literals like this:

TIME_FORMAT = %y%m%d%H%M%S(2D|2d|2B\2b|\|)

Any ideas that will work?

Tags (2)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Assuming the timestamp always is at the start of the log line, you can do this:

MAX_TIMESTAMP_LOOKAHEAD=12
TIME_FORMAT=%y%m%d%H%M%S
TIME_PREFIX=^

The prefix combined with the lookahead moves the thirteenth digit outside the scope of timestamp extraction. Specify any other regex in case the timestamp is not at the beginning of the line.

View solution in original post

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Assuming the timestamp always is at the start of the log line, you can do this:

MAX_TIMESTAMP_LOOKAHEAD=12
TIME_FORMAT=%y%m%d%H%M%S
TIME_PREFIX=^

The prefix combined with the lookahead moves the thirteenth digit outside the scope of timestamp extraction. Specify any other regex in case the timestamp is not at the beginning of the line.

View solution in original post

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Yeah, it's only from the beginning if MAX_TIMESTAMP_LOOKAHEAD = ^ is set, because that regex anchors itself to the beginning and consumes zero characters.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Is there something fixed to anchor to? For example, if your event looks like this:

blah blah variable length stuff started process foo 1401020304052D0500 blah blah

Then you could use that as your anchor:

TIME_PREFIX = started process \S+\s+

And still use the lookahead of twelve, despite the variable-length field directly in front of the timestamp.

Got some sample events?

0 Karma

woodcock
Esteemed Legend

According to the Data Preview tool, whenever TIME_PREFIX is used, MAX_TIMESTAMP_LOOKAHEAD starts counting after this (so not always from the beginning) so this solution will work!

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.