I have a logfile with two different date formats for entries. Unfortunately, the dates written to the logfile are "underspecified", and the automatic date extraction is getting them wrong. The following examples are from log.20101010 (log.YYYYMMDD, UTC date), but the individual entries get scattered across several days when they're indexed:
The following ends up on October first:
-- 10/10 01:59:50 [blah, blah, blah]
This one ends up October second:
-- 10/10 02:57:07 [blah, blah, blah]
Another format ended up the correct place today, but just because the 2-digit year happened to match the month and day so it doesn't matter if they get swapped around:al
INFO 10/10/10 02:57:07
The extracted times are all OK.
I've looked at the documentation, and it looks like the precedence rules for extracting dates and times use transforms/extractions from the individual records first, then look for rules to derive the date from the filename.
How can I force the DATE on the indexed record to be derived from the filename? I suspect that datetime.xml may be involved somehow. As an added complication, the logfile is read by a local (not lightweight) agent, passed to a forwarder, and then spread across 4 index servers - so I need to get the right machines as well as the right file(s) to update.
If you are dealing with a heavy weight forwarder, then where you put the config files is very simple. All of the parsing and event processing is done on the forwarder, so that's where all the config changes need to be made; the indexers will simply use whatever date the forwarder selects.
You are correct that creating a custom datetime.xml does seem to be your only option here. If your app doesn't produce consistent timestamps, then the standard TIME_FORMAT config option is ruled out.
A little more info would probably be helpful. Possibly a list of timestamps and how they should be interpreted would be helpful. Specifically, do you ever have to deal with both "mm/dd" and "dd/mm" at the same time? Or is it more an issue where the year component is sometimes left off?
You should be able to copy datetime.xml to a custom app, then modify it to eliminate all but the specific timestamp options that your log file uses. If you don't know what all formats could be used, then that is a very unfortunate situation, and like you are thinking, the source-based approach may be where you have to turn to.
Of course another option is disable timestamp recognition compleetly and have splunk timestamp your events as they are loaded. This could work quite well if you have a minimal indexing delay. It's possible that your event's timestamps would only be off by a few seconds or so,which is much better than being off by a day or more. (Set DATE_CONFIG = CURRENT to try this out.)