I have been ripping my hair out for the last few nights trying to figure out a solution for this issue. I have a log being ingested by a UF that has some annoying characteristics. Looks a bit like this:
** Process Started **
2021-06-01 14:40:21 INFO Application is loading something
2017-06-01 14:40:22 INFO And another thing
2017-06-01 14:40:23 WARN Something might have broken
** Process Finished **
** Process Started **
2021-06-02 20:15:50 INFO Application has done something interesting
** Process Finished **
Between the two messages are nice, timestamped, single line events. Those ones load up pretty well using defaults but the pesky non-timestamped application messages are causing all sorts of issues. I can't filter them out and it's preferable that events don't start with "** Process Started **".
Best hack i have been able to come up with so far is:
TIME_FORMAT = %Y-%m-%d %H:%M:%S.%3Q
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}
HEADER_FIELD_LINE_NUMBER = 2
PREAMBLE_REGEX = Process\s+Started
The first non-timestamped line is ignored and the rest are bundled into the end of an event. But there must be a better way.
Hi @Urbanpope
You can try following it replaces the header and footer. You shall deploy this props config to HF/indexer.
[ your_sourcetype ]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}
SEDCMD-removeheadersfooters=s/\*\*\s+(Process Finished|Process Started)\s+\*\*//g
TIME_FORMAT=%Y-%m-%d %H:%M:%S
----
An upvote would be appreciated and accept solution if it helps!
Thanks for that venkatasri, much appreciated.
With a few tweaks I was able to get it to work in our dev environment.
That's great. glad it helped.
Thanks for your reply venkatasri .
Using SEDCMD is a good suggestion, however we are deploying to a UF so pretty limited in what we can do.
If i remember correctly, cooked data skips most of the pipelines on the indexer at index time, but would that also apply to SEDCMD?
UF functionality is limited to input/forwarding actual cooking happens in HF/indexer. Having said that, if you HF -> indexer then indexer just does the indexing rest of pipelines being skipped because there were already being processed in HF.
SEDCMD works only at index-time that means on HF/indexer. If you have limitation and no control then pre-process the file by removing those lines do not have timestamp and configure UF to monitor pre-processed files.
---
An upvote would be appreciated and accept solution if it helps!
Hi @Urbanpope
You can try following it replaces the header and footer. You shall deploy this props config to HF/indexer.
[ your_sourcetype ]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}
SEDCMD-removeheadersfooters=s/\*\*\s+(Process Finished|Process Started)\s+\*\*//g
TIME_FORMAT=%Y-%m-%d %H:%M:%S
----
An upvote would be appreciated and accept solution if it helps!