I have a particularly challenging log format and would appreciate any inputs on how to tackle this problem.
Looking for a feasible props.conf setup that will correctly index the log below
Example (blank lines only added for readability):
SINGLE_LINE_LOG_EVENT
SINGLE_LINE_LOG_EVENT
OTHER_SINGLE_LINE_LOG_EVENT
Tue 06 Jun 10:00:00 UTC 2023
ANOTHER_SINGLE_LINE_LOG_EVENT
Tue 06 Jun 10:00:01 UTC 2023
LARGE_MULTILINE_EVENT
The first three lines are all single events and should be parsed accordingly. But they have no timestamp
The fourth and fifth line together form a single event
Lines 6 and 7 also form a single event, but the event from line 7 is a multiline event that shall be parsed as a single event
I am prepared to make the sacrifice that the lines without timestamp get assigned the CURRENT timestamp, if there is no other solution for this.
I tried using the following (the Regex looks for the timestamp)
MUST_NOT_BREAK_AFTER = .{3}\s.{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\sUTC\s\d{4}
MUST_BREAK_AFTER = .{3}\s.{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\sUTC\s\d{4}
As well as this (I tried various combinations of this, with different capture groups. Note that the file in question only has newlines and no carriage returns, hence no '\r')
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\n].{3}\s.{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\sUCT\s\d{4})
Hi @isoutamo, thanks for the reply
Unfortunately, I dont know how many processes are writing to said file. I can only use it "as is".
You are right however, this issue should be addressed on the side of the application(s) writing to that file.
Regards
@zapping575 - I think you need to write your own parser that can do that.
Also, as I can see you have combination of single line events and multi-line events. That can also be handle in your python code which will act as parser.
I hope this helps!!! Kindly upvote if it does!!!!
Cheers @VatsalJagani
Thank you for the help.
I cannot use HF, I can only use the UF.
Since there are no other answers, I figure that manually preprocessing is the only way to go in this case.