I have an odd problem with time extraction from some CSV files. I specify the time format using the following:
TIME_FORMAT = %m:%d %Y %H:%M:%S:%Q
and it generally works. I am not convinced it is the correct format to use, but it seems to work(most of the time). It works fine with times like this:
07:30 2013 12:59:57:0
and it does not work with times that look like this:
01:01 2000 00:00:03:0
With these times, it seems that splunk chooses to use the file creation time(actually its a bit more complex than that - from what I can tell, it used the file creation date and the time from when the file was last modified). I know that data from the year 2000 is generally not relevant,
but with the data I am looking at, there scenarios where the date may not yet be properly initialized and this is where the time will startup.
I want to be able to filter these invalid dates out using the time specification, but do not want to throw away the data with invalid dates(it may hold useful diagnostic value).
I suppose one answer to this might be that Splunk does not recognize times from the year 2000, but I am not sure why that would be.
BTW, this data is being monitored on a windows machine with a Universal Forwarder and the indexer is on a Linux machine.
Can anyone explain what might be happening here?
This is probably because Splunk thinks your events are "too old" to trust that the timestamps have really been parsed correctly. From the props.conf docs:
MAX_DAYS_AGO = <integer>
* Specifies the maximum number of days past, from the current date, that an extracted date can be valid.
* For example, if MAX_DAYS_AGO = 10, Splunk ignores dates that are older than 10 days ago.
* Defaults to 2000 (days), maximum 10951.
* IMPORTANT: If your data is older than 2000 days, increase this setting.
So, set MAX_DAYS_AGO
to something high enough in props.conf and you should be good to go.