Hi,
I have a log file that looks something like that
2018-03-06 00:30 abc
00:40 def
01:40 ghi
03:40 jkl
09:40 mno
20:30 pqr
00:30 stu
04:50 vwx
07:10 xy
So the date of the events is only given once at the beginning of the file and everything that follows has only a time in hours and minutes.
For me it is clear that the events after "abc" also happened on 2018-03-06 up until "stu" which happened on 2018-03-07 but that's more based on experience than on rules.
How can I tell Splunk to interpret this file the same way I do. With my current (i.e. default) settings the timestamps are interpreted as follows
3/6/18 12:30:00.000 AM 2018-03-06 00:30 abc
3/6/18 12:40:00.000 AM 00:40 def
3/6/18 1:40:00.000 AM 01:40 ghi
3/6/18 3:40:00.000 AM 03:40 jkl
3/5/18 9:40:00.000 AM 09:40 mno
3/4/18 8:10:00.000 PM 20:30 pqr
3/4/18 2:30:00.000 AM 02:30 stu
3/4/18 4:50:00.000 AM 04:50 vwx
3/4/18 7:10:00.000 AM 07:10 xy
So at every bigger gap the date goes back one day.
Also the transition from 20:30 to 02:30 goes back 18 hours instead of going forward 6 hours.
Can you help me with that?
BTW There is nothing meaningful in the filename an we cannot use the modification date of the file, so I guess we are stuck with the date on the top of the file
Do you have any control over how the logs are written? Can you convince the developers to use a saner time format?
Unfotunately no. There is already a bunch of log files written in this format, that we want to analyse as well as a a requirement to analyse new ones. We may be lucky with the new files but then we still need a solution for the old ones. Preferably with one sourcetype but that not set in stone.
For old logs files you could also create a small script that runs through the lines, interprets the timestamps, and adds the relevant date info at the start of each line.
@gschr, how many lines you might have in per day log? What is the format of a typical log file name? Is it a generic name that gets rolled over/recreated or does it have timestamp in the file name as well? Is there meaning data in the first line of the log 2018-03-06 00:30 abc
(i.e. is abc also important for analysis)? Are all events single line event or multi line as well?
@niketnilay at the moment I only have one example it has about 1000 lines but that may vary a lot. It is a kind of an IoT scenario, what makes it even harder to change the log format I guess. Also I still need to convince those guys to use Splunk 😉 The name of the log file equals the devices name that logged the data. The file gets rolled more ore less daily but it still contains some events from the next day.
The first line actually says something like
Logging startet at 2018-03-06 00:30
(yes it doesn't even start with the timestamp, I left that out for simplicity) so there's no need to keep it other than determining the correct date.
The events are all single lined.
@FrankVI I also thought about a script that transforms the data but I would lose ability to monitor the log file than.
I've heard about the unarchive_cmd (https://www.splunk.com/blog/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles.html) that could possibly be feed with every new line that appears in the file and change it to add the correct date. The logs are not binary but anyway... it may work but it is really not as easy.
I'd wish to understand what settings cause splunk to behave that way. I've seen MAX_DIFF_SECS_AGO and MAX_DIFF_SECS_HENCE in props.conf. But they don't seem to be changing anything here.
You mentioned that for new logs you might be able to get the source system changed to log proper timestamps. I think chasing that would be where I would put my energy.
And then a script for transforming the backlog of existing files before feeding them to splunk.
If you can't get the source system changed, you could also look at solving this with a scripted input, that runs e.g. every 5 minutes, parses the latest lines and feeds them into Splunk. Not as nice as a proper 'real time' file monitor, but might be good enough for your use case.