Getting Data In

Setting date on event based on filename

snowmizer
Communicator

I'm trying to load one of my logs from my phone server into Splunk. Splunk will read the log file and break the events correctly.

My problem is with the date that Splunk places on the event. The event in my log has a time but no date however the date is in the filename. When I look at the loaded events in Splunk the time is extracted properly however the date is the next day (which is because of the modified date on the file).

I've copied the existing datetime.xml into /etc/system/local and added an extraction for my date from my filename. How can I pull the date from the filename into Splunk as the date on my event?

File name: d:...\VMail-110212-000000.Log

Raw event looks like from (d:...\VMail-110212-000000.Log but shows date as 2/13/11 because of file modified date):

23:59:57.334 ( 4172: 5540) [MS] Entering Process MWI loop 23:59:57.334 ( 4172: 5540) [MS] Process MWI loop return, no more entries left

Datetime.xml:

<datetime>
    <define name="_masheddate3" extract="year, month, day">
    <text><![CDATA[source::.*?\\Vmail-\d{2}\d{2}\d{2}.*\.Log]]></text>
    </define>
    ...

    <datepatterns>
    ...
        <use name="_masheddate3"/>
    </datepatterns>

</datetime>

Props.conf

[ShoreTel_VMail]
SHOULD_LINEMERGE = FALSE
DATETIME_CONFIG = \etc\system\local\datetime.xml
TRANSFORMS-shoretel-comment = shoretel_comment

Inputs.conf

[monitor://D:\Sample Logs\Shoretel_Current]
disabled=false
host=Shoretel
sourcetype=ShoreTel_VMail
crcSalt=<SOURCE>
index=shoretel
Tags (2)
1 Solution

hexx
Splunk Employee
Splunk Employee

What is going on here?

This occurs because the time stamp of the events in your source file is incomplete (has a time, doesn't have a date) which leaves Splunk to do some amount of guessing as to what date to assign to the new event.

In your case, I am fairly certain that the following occurs :

  • Splunk discovers the file "VMail-110212-000000.Log" and reads the first event which contains the partial time stamp "23:59:57.334".
  • From this partial time stamp, Splunk deduces that the date is February 12th 2011 (using the file name and your custom datetime.xml regex) and that the time is "23:59:57.334".
  • If the next event comes in with a time of "00:01:12.232" for example, Splunk will deduce that we have moved to the next day and will time stamp that event with as "February 13th 2011 00:01:12.232"

As a result, Splunk will index the file as if it spanned over several days. Note that this will also occur if there is a discontinuity in the time of subsequent events. For example, an event recorded with a time of "10:28:59" following an event with a time of "10:29:12" will also trigger an increase of 1 for the value of date_mday (the internal field where Splunk stores the value of the day of the month).

How to prevent this?

As of 4.1.x, there is no way to prevent this from happening when you are indexing historical data (i.e files that contain events not from the current day).

If you are indexing current events, you can add the parameter "MAX_DAYS_HENCE = 0" to the props.conf stanza for this source/sourcetype to prevent Splunk from increasing the value of date_mday beyond the current day.

In 4.2, Splunk will honor the MAX_DIFF_SECS_AGO parameter even for incomplete time stamps (which is not the case in 4.1.x) and you will be able to use that parameter to prevent date_mday increases for both current and historical files. From props.conf.spec :

MAX_DIFF_SECS_AGO = <integer>
* If the event's timestamp is more than <integer> seconds BEFORE the previous timestamp, only accept it if it has the same exact time format as the majority of timestamps from the source.
* IMPORTANT: If your timestamps are wildly out of order, consider increasing this value.
* Defaults to 3600 (one hour).

View solution in original post

hexx
Splunk Employee
Splunk Employee

What is going on here?

This occurs because the time stamp of the events in your source file is incomplete (has a time, doesn't have a date) which leaves Splunk to do some amount of guessing as to what date to assign to the new event.

In your case, I am fairly certain that the following occurs :

  • Splunk discovers the file "VMail-110212-000000.Log" and reads the first event which contains the partial time stamp "23:59:57.334".
  • From this partial time stamp, Splunk deduces that the date is February 12th 2011 (using the file name and your custom datetime.xml regex) and that the time is "23:59:57.334".
  • If the next event comes in with a time of "00:01:12.232" for example, Splunk will deduce that we have moved to the next day and will time stamp that event with as "February 13th 2011 00:01:12.232"

As a result, Splunk will index the file as if it spanned over several days. Note that this will also occur if there is a discontinuity in the time of subsequent events. For example, an event recorded with a time of "10:28:59" following an event with a time of "10:29:12" will also trigger an increase of 1 for the value of date_mday (the internal field where Splunk stores the value of the day of the month).

How to prevent this?

As of 4.1.x, there is no way to prevent this from happening when you are indexing historical data (i.e files that contain events not from the current day).

If you are indexing current events, you can add the parameter "MAX_DAYS_HENCE = 0" to the props.conf stanza for this source/sourcetype to prevent Splunk from increasing the value of date_mday beyond the current day.

In 4.2, Splunk will honor the MAX_DIFF_SECS_AGO parameter even for incomplete time stamps (which is not the case in 4.1.x) and you will be able to use that parameter to prevent date_mday increases for both current and historical files. From props.conf.spec :

MAX_DIFF_SECS_AGO = <integer>
* If the event's timestamp is more than <integer> seconds BEFORE the previous timestamp, only accept it if it has the same exact time format as the majority of timestamps from the source.
* IMPORTANT: If your timestamps are wildly out of order, consider increasing this value.
* Defaults to 3600 (one hour).

snowmizer
Communicator

Thanks for the explanation and help hexx.

Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...