Splunk Search

Why do I keep getting an error extracting the timestamp from formatted text data?

kenvanderheyden
Path Finder

Hi all,

I'm having an issue with timestamp extraction.
Trying to extract the timestamp from formatted text, and I can't alter the format of the text.

Problem is the text is space separated and the generated number of spaces between the date elements is not consistent due to the missing leading 0-s in the month and day parts.
Below are a few example lines of text:

ABE 1900 1 5 19 0 0.00 
ABE 1900 1 11 9 7 0.00 
UTSU 1900 1 18 7 46 0.00

I have tried the following settings to extract the timestamp at import:

TIME_FORMAT=%Y[ ]{1,2}%m[ ]{1,2}%d[ ]{1,2,3}%H[ ]{1,2}%M[ ]{1,2}%S
TIME_PREFIX=[ ]{4-10}
MAX_TIMESTAMP_LOOKAHEAD=50
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
disabled=false
pulldown_type=true

I'm guessing that I should reformat the string in the config before trying to recognize the date fields?
Any help would be appreciated.

Thanks,
Ken.

0 Karma
1 Solution

kenvanderheyden
Path Finder

I found a small comment on this page that explained my issue:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables
Thanks Greg for sharing the comment !

Apparently dates prior to 1970/01/01 faile to get recognized in timestamp matching.
Even if the regex or strptime string is correct, the date's are not recognized.

Splunk team, please find a solution for this. After all "big data" often includes "old data".

Regards,
Ken.

View solution in original post

kenvanderheyden
Path Finder

I found a small comment on this page that explained my issue:

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commontimeformatvariables
Thanks Greg for sharing the comment !

Apparently dates prior to 1970/01/01 faile to get recognized in timestamp matching.
Even if the regex or strptime string is correct, the date's are not recognized.

Splunk team, please find a solution for this. After all "big data" often includes "old data".

Regards,
Ken.

kenvanderheyden
Path Finder

Hello,

Thanks for you answer, however it does not seem to be working.

I have been experimenting with a small subset, altered data.
This is my test data I am trying to import:

ABE 1900 01 05 19 00 0.00 -3.000 102.000 7.0
ABE 1900 01 11 09 07 0.00 -5.000 148.000 7.0
UTSU 1900 01 18 07 46 0.00 44.500 148.500 6.7
ABE 1900 01 20 06 33 0.00 20.000 -105.000 7.3
UTSU 1900 01 31 19 22 0.00 48.000 146.000 7.5

The fields are tab sepparated.
There is no timestamp available according to the splunk import interface.

So I'm trying to import the file, and assign a timestamp to the events, extracted from the sepparated date fields: year, month, day, ...

I managed to transform the date fields into one field with any desired layout, an example:

[fixEarthquakeDates]
REGEX = (.{3,4})([\t]{1})([\d]{4})([\t]{1})([\d]{2})([\t]{1})([\d]{2})([\t]{1})([\d]{2})([\t]{1})([\d]{2})(.+)
FORMAT = $1$2$3-$5-$7 $9:$11:00.00 $12
DEST_KEY = _raw 
SOURCE_KEY = _raw

Formats the data as follows:

ABE 1900-01-05 19:00:00.00 0.00 -3.000 102.000 7.0
ABE 1900-01-11 09:07:00.00 0.00 -5.000 148.000 7.0
UTSU 1900-01-18 07:46:00.00 0.00 44.500 148.500 6.7
ABE 1900-01-20 06:33:00.00 0.00 20.000 -105.000 7.3

Any other format can be achieved with the regex transformation. (I tried several alternatives).

The error I keep getting is:
Could not use strptime to parse timestamp from "1900 01 05 19 00 0.00 -3.000 102.000 7.0".
Failed to parse timestamp. Defaulting to file modtime.

Current settings :

MAX_TIMESTAMP_LOOKAHEAD=0
NO_BINARY_CHECK=true
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y %m %d %H %M %S.%2N
TIME_PREFIX=\t
TRANSFORMS-timestamp=fixEarthquakeDates
TZ=UTC
disabled=false
pulldown_type=true

Have been playing with numerous alternatives for the TIME_FORMAT, to no effect.

Apparently the transformation happens after the parsing for a timestamp ?

Can someone confirm this ? Or point me in the right direction ?

Regards,
Ken.

0 Karma

ppablo
Community Manager
Community Manager

Please be sure that when responding to someone's answer, click on "Add comment" directly below their answer or, if responding to someone's comment, type in the "Add your comment..." box directly below their comment. You typed your response in the "Enter your answer here..." box at the very bottom of the page which, instead, posts a brand new answer when it was really meant as a comment. This will help with a clean continuous flow of the conversation.

Your "answer" can no longer be converted to a comment since it is beyond the character limit. If you have a long response and are hitting a character limit when leaving a comment, just break it up into multiple comments. So unless @richgalloway subscribed to email notifications for your post, he won't know you responded to him unless you comment on his answer. Just something to keep in mind from here on out. Thanks.

0 Karma

kenvanderheyden
Path Finder

Thanks, i will keep that in mind.

Regards,
Ken.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Your TIME_FORMAT string is incorrect. It must use strptime() strings, not regex. Try this:

TIME_FORMAT = %Y %m %d %H %M %S.%2N
---
If this reply helps you, an upvote would be appreciated.
0 Karma

kenvanderheyden
Path Finder

Hi,

Thanks for your suggestion.
I have been experimenting with it.

Up to now I could not find a way to fix this issue.
Keep seeing the error:
Could not use strptime to parse timestamp from "1931/09/21 13:34:00 EHB ADEQ 37.961 177.811 6.6".
Could not use regex to parse timestamp from "1931/09/21".

The line below is the raw format of the text. It's a simple line of text, with a clear date and time format.
Should be recognizable with %Y %m %d ...
1931/09/21 13:34:00 EHB ADEQ 37.961 177.811 6.6

I have tried filling in strptime() formated search strings and regex patterns. none seem to be able to find the date.

Open to suggestions from anyone.

Regards,
Ken.

0 Karma

kenvanderheyden
Path Finder

update:

I have tried the following:

Added to the transforms.conf in the etc/local folder:

[fixEarthquakeDates]
REGEX = ([\d]{4})([\s]{1,2})([\d]{1,2})([\s]{1,2})([\d]{1,2})([\s]{1,3})([\d]{1,2})([\s]{1,3})([\d]{1,2})([\s]{1,3})([\d]{1})
FORMAT = $1/$3/$5 $7:$9:$11
DEST_KEY = _raw
SOURCE_KEY = _raw

And altered the props.conf as follows:

[ earthquakes ]
TIME_FORMAT=%Y/%m/%d %H:%M:S
MAX_TIMESTAMP_LOOKAHEAD=0
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
disabled=false
pulldown_type=true
TRANSFORMS-timestamp=fixEarthquakeDates

Still results in an error "failed to parse timestamp".

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!