Getting Data In

Timestamp hour without leading zero

Contributor

I have a problem regarding the time stamp recognition in one of my log types. The one affected is a checkpoint export which I cannot change in format as it is delivered by a 3rd party company every night.

The time stamps - as you can see in the exmaple lines below - have a format like
30Nov2011;23:59:58

The extraction works well for hours with 2 digits. The lines with an hour of only 1 digit are indexed somewhere in 2010 - so the time stamp is not recognized correctly.

Has anyone an idea to fix this?


I already checked the "strptime" function but in the manual it says "%H is the hour (24-hour clock) [0,23]; leading zeros are permitted but not required."


Well - in my case they seem to be required


My props.conf:

[cp]

CHECK_FOR_HEADER = true

TIME_FORMAT=%d%b%Y;%H:%M:%S



1652;30Nov2011;23:59:58;192.168.249.2;log;drop;
1654;30Nov2011;23:59:59;192.168.249.2;log;drop;
1710;30Nov2011;23:59:58;192.168.249.22;log;drop;
1990;1Dec2011;0:00:00;192.168.249.2;log;drop;
1967;1Dec2011;0:00:01;192.168.249.12;log;drop;

1 Solution

Splunk Employee
Splunk Employee

This is common with some uses of IBM Websphere I have seen timestamps that look like this:

[12/14/11 1:00:00:115 PST]  hello 
[12/14/11 1:00:00:117 PST]  goodbye 
[12/14/11 1:00:00:114 PST]  whatever 
[12/14/11 1:08:00:117 PST]  super 
[12/14/11 0:07:00:113]  star 
[12/14/11 0:06:00:117 PST]  who  
[12/14/11 0:04:00:118 PST]  cares

(notice above, some have timezone, and some do not)

In this case, a custom "datetime.xml" will solve it. ($SPLUNK_HOME/etc/datetime.xml has all the default config for timestamp extraction patterns). Its not rocket science to make you're own, you just have to write a simple regex for it.

You'll need to edit two files. "props.conf" which you may already edit from time to time, and a file that contains a new datetime config, in this case we'll call it "ninjadatetime.xml".

props.conf will need to reference the location of "ninjadatetime.xml" as the setting for the DATETIME_CONFIG entry. It will now ignore splunk's defaults and take the new pattern we've created.

ninjadatetime.xml --- has a definition for the "order in which splunk will assign parts of a date and a time", and the corresponding regex, matching and capturing each appropriate component of the date and the time.
If your events have no timestamp, you also may want to set the timezone as well (as i have below).

FILE -> props.conf

[mysourcetype]
DATETIME_CONFIG = /etc/apps/search/local/ninjadatetime.xml
TIME_FORMAT = %m/%d/%y %k:%M:%S:%3f
TZ = America/Chicago

FILE -> ninjadatetime.xml

<datetime>
   <!-- we're using Splunk's default timezone extraction regex below-->
<define name="_zone" extract="zone">
         <text><![CDATA[((?:(?:UT|UTC|GMT(?![+-])|CET|CEST|CETDST|MET|MEST|METDST|MEZ|MESZ|EET|EEST|EETDST|WET|WEST|WETDST|MSK|MSD|IST|JST|KST|HKT|AST|ADT|EST|EDT|CST|CDT|MST|MDT|PST|PDT|CAST|CADT|EAST|EADT|WAST|WADT|Z)|(?:GMT)?[+-]\d\d?:?(?:\d\d)?)(?!\w?))?]]></text>
</define>

  <!--this pattern captures all of the time/date info, and then uses the above patterns to gather timezone.-->
<define name="_wsdatewzone" extract="month, day, year,hour,minute,second,subsecond,zone">
    <text><![CDATA[(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+):(\d+)]]></text>
        <text><![CDATA[\s+]]></text>
    <use name="_zone"/>
</define>

  <!--this pattern captures all of the time/date info but no timezone as one is not present-->
<define name="_wsdatenozone" extract="month, day, year,hour,minute,second,subsecond">
        <text><![CDATA[(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+):(\d+)]]></text>
</define>

<timePatterns>
      <use name="_wsdatewzone"/>
      <use name="_wsdatenozone"/>
</timePatterns>

<datePatterns>
      <use name="_wsdatewzone"/>
      <use name="_wsdatenozone"/>

</datePatterns>

</datetime>

alt text

View solution in original post

Communicator

Hi,
I'm trying to do something similar to this, but I want to extract the time and date from my source path.
I've tried modifying datetime.xml, but I can't get it to work, I followed this example but changed the regex to match my format.
This is what it looks like, but I can't get it to work. Does anyone have any suggestions?

<define name="_wsdatenozone" extract="year, month, day,hour,minute,second">
        <text><![CDATA[source::.*?(\d{4})-(\d{2})-(\d{2})_(\d{2})-(\d{2})-(\d{2})]]></text>
</define>

<timePatterns>
      <use name="_wsdatenozone"/>
</timePatterns>

<datePatterns>
      <use name="_wsdatenozone"/>
</datePatterns>

Edit: My timestamps are of the form:

C:\Users\angeliga\Filer\379177\Report_2013-05-21_16-49-29\Server\file
where the timestamp is 2013-05-21_16-49-29 (YYYY-MM-DD_hh-mm-ss)
0 Karma

Communicator

I tried your suggestion, but it won't work 😕
Thanks anyway

0 Karma

Contributor

I dont know for sure .. but have you tried putting the "\d" in brackets like "[\d]{4}" ?
I would also escape the "-" symbols that way "\-".

0 Karma

Communicator

Something like C:\Users\angeliga\Filer\379177\Report2013-05-2116-49-29\Server\file

Where the timestamp is 2013-05-2116-49-29 (YYYY-MM-DDhh-mm-ss)

0 Karma

Contributor

How do your timestamps look like?

0 Karma

Path Finder

I recently encountered the same issue with some WebSphere logs.

An easy solution I came up with is to modify the default datetime.xml in $SPLUNK_HOME/etc/

The only modification you need to make is to change the hour detection (ie, in the section labelled define name="_hour" extract="hour") from this:

[([01]?[1-9]|[012][0-3])(?!\d)]]

to this:

[([01]?[0-9]|[012][0-3])(?!\d)]]

Then it successfully picks up the 0:xx:xx:xxx event timestamps. I have not found a situation where this has caused side effects so far.

Splunk Employee
Splunk Employee

This is common with some uses of IBM Websphere I have seen timestamps that look like this:

[12/14/11 1:00:00:115 PST]  hello 
[12/14/11 1:00:00:117 PST]  goodbye 
[12/14/11 1:00:00:114 PST]  whatever 
[12/14/11 1:08:00:117 PST]  super 
[12/14/11 0:07:00:113]  star 
[12/14/11 0:06:00:117 PST]  who  
[12/14/11 0:04:00:118 PST]  cares

(notice above, some have timezone, and some do not)

In this case, a custom "datetime.xml" will solve it. ($SPLUNK_HOME/etc/datetime.xml has all the default config for timestamp extraction patterns). Its not rocket science to make you're own, you just have to write a simple regex for it.

You'll need to edit two files. "props.conf" which you may already edit from time to time, and a file that contains a new datetime config, in this case we'll call it "ninjadatetime.xml".

props.conf will need to reference the location of "ninjadatetime.xml" as the setting for the DATETIME_CONFIG entry. It will now ignore splunk's defaults and take the new pattern we've created.

ninjadatetime.xml --- has a definition for the "order in which splunk will assign parts of a date and a time", and the corresponding regex, matching and capturing each appropriate component of the date and the time.
If your events have no timestamp, you also may want to set the timezone as well (as i have below).

FILE -> props.conf

[mysourcetype]
DATETIME_CONFIG = /etc/apps/search/local/ninjadatetime.xml
TIME_FORMAT = %m/%d/%y %k:%M:%S:%3f
TZ = America/Chicago

FILE -> ninjadatetime.xml

<datetime>
   <!-- we're using Splunk's default timezone extraction regex below-->
<define name="_zone" extract="zone">
         <text><![CDATA[((?:(?:UT|UTC|GMT(?![+-])|CET|CEST|CETDST|MET|MEST|METDST|MEZ|MESZ|EET|EEST|EETDST|WET|WEST|WETDST|MSK|MSD|IST|JST|KST|HKT|AST|ADT|EST|EDT|CST|CDT|MST|MDT|PST|PDT|CAST|CADT|EAST|EADT|WAST|WADT|Z)|(?:GMT)?[+-]\d\d?:?(?:\d\d)?)(?!\w?))?]]></text>
</define>

  <!--this pattern captures all of the time/date info, and then uses the above patterns to gather timezone.-->
<define name="_wsdatewzone" extract="month, day, year,hour,minute,second,subsecond,zone">
    <text><![CDATA[(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+):(\d+)]]></text>
        <text><![CDATA[\s+]]></text>
    <use name="_zone"/>
</define>

  <!--this pattern captures all of the time/date info but no timezone as one is not present-->
<define name="_wsdatenozone" extract="month, day, year,hour,minute,second,subsecond">
        <text><![CDATA[(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+):(\d+)]]></text>
</define>

<timePatterns>
      <use name="_wsdatewzone"/>
      <use name="_wsdatenozone"/>
</timePatterns>

<datePatterns>
      <use name="_wsdatewzone"/>
      <use name="_wsdatenozone"/>

</datePatterns>

</datetime>

alt text

View solution in original post

Legend

Excellent answer!

0 Karma

Splunk Employee
Splunk Employee

Tat-Wee.. I fixed it to accommodate events that do and do not have a timezone.

0 Karma

Splunk Employee
Splunk Employee

This is very nice, thanks for the effort! I haven't had the time to get to the root cause of the issue, and suspected it could be the datetime.xml, and you beat me to it.

0 Karma

Splunk Employee
Splunk Employee

Yes, for some reasons I am encountering this problem as well and that is puzzling me. Timestamp with a single digit as an hour gets their Timestamp extracted out wrongly, e.g. 0:12:53. But if I changed them to a double digit, e.g.00:12:53 they are fine.

0 Karma