Getting Data In

extracting timestamp from log with one date and multiple time fields

Contributor

Hi,

I am unable to extract a valid _time from the following log:

0168 004 07:59:03 09:01:35 0062 asdfghj ee bonfanyti Y                                             P1233443P       443386 0012 07:59:17    dial_in  1                                  1234 N N                                                       34567654555 000523456778 0000 09/20/10  0                                                                                                                                                                                                 1624443                                          01
0344 003 07:58:33 09:01:36 0063 Ssdfas Fd asdfffftim Y                                             P5243343P       455483 0032 07:58:48    dial_in  1                                  7950 N N                                                                   000234234218 0000 09/20/10  0                                                                                                                                                                                                 1624443                                          01
0433 007 08:00:14 09:01:36 0061 ewrwreerer asdfsdfff N                                             P5243443P       451333 0061 08:00:30    dial_in 19                                  7952 N N                                                       58916588270 000522349181 0000 09/20/10  0                                                                                                                                                                                                 5673443                                          01

timestamps I would like to extract are:

1) 09/20/10 07:59:03

2) 09/20/10 07:58:33

3) 09/20/10 08:00:14

Reading the documentation I have figured out that I can only extract it using a custom datetime.xml

I have tried to construct a datatime.xml:

<datetime>
<define name="ccm_1_date" extract="month,day,year,">
    <text><![CDATA[\s+\d+\s(\d+)/(\d+)/(\d+)]]></text>
  </define>
  <define name="ccm_1_time" extract="second,minute,hour,">
    <text><![CDATA[\s\d+:\d+:\d+\s]]></text>
  </define>

  <timePatterns>
    <use name="ccm_1_time"/>
  </timePatterns>
  <datePatterns>
    <use name="ccm_1_date"/>
  </datePatterns>

</datetime>

The date pattern is probably good, but the time pattern is suspicious.

props.conf:


[host::ccm]

SHOULD_LINEMERGE = false

DATETIME_CONFIG = /etc/apps/search/local/datetime.xml

MAX_TIMESTAMP_LOOKAHEAD = 300

Any help would be appreciated.

Tags (2)
0 Karma
1 Solution

Splunk Employee
Splunk Employee

Try the following datetime.xml:

<datetime>
    <define name="ccm_1_date" extract="month,day,year">
        <text><![CDATA[0000\s(\d{2})/(\d{2})/(\d{2})]]></text>
    </define>
    <define name="ccm_1_time" extract="hour,minute,second">
        <text><![CDATA[\*\*\*\s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <define name="ccm_2_time" extract="hour,minute,second">
        <text><![CDATA[\d{3}s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <timePatterns>
      <use name="ccm_1_time"/> 
      <use name="ccm_2_time"/>
    </timePatterns>
    <datePatterns>
      <use name="ccm_1_date"/> 
    </datePatterns>
</datetime>      

Also, I don't recommend updating the default datetime.xml. During upgrade your configuration will be overwritten. Name it something like datetime2.xml and specify this in your props.conf with DATETIME_CONF. ie:

[extracttime]
SHOULD_LINEMERGE = false
DATETIME_CONF=\etc\garfield.xml
MAX_TIMESTAMP_LOOKAHEAD = 1000

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Try the following datetime.xml:

<datetime>
    <define name="ccm_1_date" extract="month,day,year">
        <text><![CDATA[0000\s(\d{2})/(\d{2})/(\d{2})]]></text>
    </define>
    <define name="ccm_1_time" extract="hour,minute,second">
        <text><![CDATA[\*\*\*\s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <define name="ccm_2_time" extract="hour,minute,second">
        <text><![CDATA[\d{3}s(\d{2}):(\d{2}):(\d{2})]]></text>
    </define>
    <timePatterns>
      <use name="ccm_1_time"/> 
      <use name="ccm_2_time"/>
    </timePatterns>
    <datePatterns>
      <use name="ccm_1_date"/> 
    </datePatterns>
</datetime>      

Also, I don't recommend updating the default datetime.xml. During upgrade your configuration will be overwritten. Name it something like datetime2.xml and specify this in your props.conf with DATETIME_CONF. ie:

[extracttime]
SHOULD_LINEMERGE = false
DATETIME_CONF=\etc\garfield.xml
MAX_TIMESTAMP_LOOKAHEAD = 1000

View solution in original post

0 Karma

Motivator

Your regex in ccm_1_time does not capture any groups.

Try adding parentheses to capture each value, and make sure that the timestamp regex only matches the first set of colon-delimited digits...

<datetime>
    <define name="ccm_1_date" extract="day,month,year,">
        <text><![CDATA[\s+\d+\s(\d+)/(\d+)/(\d+)]]></text>
    </define>
    <define name="ccm_1_time" extract="hour,minute,second,">
        <text><![CDATA[^(?:\d+\s)+(\d+):(\d+):(\d+)\s]]></text>
    </define>

    <timePatterns>
        <use name="ccm_1_time"/>
    </timePatterns>
    <datePatterns>
        <use name="ccm_1_date"/>
    </datePatterns>
</datetime>
0 Karma

Motivator

The order is still wrong; it would need to be "day,month,year,". The regex looks like it should match, but in your sample data the second part is 20, which isn't a valid month.

0 Karma

Contributor

found a typo, the correct time line is :
<![CDATA[^(?:\d+\s)+(\d+):(\d+):(\d+)\s]]>

Now the time part is correctly recognised, date part is still not working as it should. What could be the problem with:


<![CDATA[\s+\d+\s(\d+)/(\d+)/(\d+)]]>

0 Karma

Contributor

modified accordingly, _time is again 9/23/10 9:50:47.000 PM

0 Karma

Motivator

Did't notice the field order - modified above to correct.

0 Karma

Splunk Employee
Splunk Employee

also, the extract must be in order of the capture groups. use hour, minute, second instead of second, minute, hour.

0 Karma

Splunk Employee
Splunk Employee

because if Splunk fails to get a date or time from the data, it next tries the file/source name, and then the mod time of the file. http://www.splunk.com/base/Documentation/latest/Admin/HowSplunkextractstimestamps#Precedence_rules_f...

0 Karma

Contributor

9/23/10 9:50:47.000 PM is the time of last modificaton of the log file. Why is it used instead of the intended fields?

0 Karma

Contributor

No, it does not. Every event has the same _time field :
9/23/10 9:50:47.000 PM

0 Karma