Hello,
I have a log file where each event starts with a date, however, there are two date formats. There are multi lines in some of the events and some of the data are separated by a blank line. Upon uploading the file, Splunk thinks the blank line is the start of a new event, so for every line after that blank line, it splits the data into a new event. Here's an example:
2020-11-02 18:40:31,293+0000 some data INFO some more data
2020-11-03 18:40:31,293+0000 some data INFO some more data
2020-11-05 18:40:31,293+0000 some data INFO some more data
06-FEB-2020 18:40:11.289 INFO [main} data some more data
2020-11-12 18:40:31,293+0000 some data INFO some more data
data to look for
___testing________
ID:0
type: Fruit
Name: Mango
Desc: Ripe
2020-11-22 18:40:31,293+0000 some data INFO some more data
starting something new
2020-11-23 18:40:31,293+0000 some data INFO some more data
I think by telling splunk to ignore blank lines or remove it, should fix my problem as I want to keep all multiline data together within the event that starts with a date, but I haven't had much luck with getting the appropriate regex to work.
I hope the experts can help with this. Thanks in advance.
ok I spoke too soon.
When max_timestamp_lookahead = 23, it works with these props
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n])+
NO_BINARY_CHECK = true
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 23
If i remove Lookahead value it doesn't work and it's probably because of the one date format that does not have zulu info.
If I used your example, doesn't break out all event per date. It does for some, but then once it meets that event with the Tabs, blanks, and new lines, it groups all those date event and its data into a single event.
It's a new dataset that I'm trying to upload, so creating the props for it is where I am currently. This log file is a catalina.out file, so it receives events from multiple executable, which mean that there will be different date formats at the start of each event. Based on the sample data I provided above, the output Splunk is converting it is like this:
event 1: 2020-11-02 18:40:31,293+0000 some data INFO some more data
event 2: 2020-11-03 18:40:31,293+0000 some data INFO some more data
event 3: 2020-11-05 18:40:31,293+0000 some data INFO some more data
event 4: 06-FEB-2020 18:40:11.289 INFO [main} data some more data
event 5: 2020-11-12 18:40:31,293+0000 some data INFO some more data
event 6: data to look for
event 7: ___testing________
event 8: ID:0
event 9: type: Fruit
event 10: Name: Mango
event 11: Desc: Ripe
event 12: 2020-11-22 18:40:31,293+0000 some data INFO some more data
event 13: starting something new
event 14: 2020-11-23 18:40:31,293+0000 some data INFO some more data
Sorry, i hope this helps.
It's not the most efficient way to parse events, but these settings may help.
[mysourcetype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = true
TIME_PREFIX = ^
This worked, thanks so much.
ok I spoke too soon.
When max_timestamp_lookahead = 23, it works with these props
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n])+
NO_BINARY_CHECK = true
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 23
If i remove Lookahead value it doesn't work and it's probably because of the one date format that does not have zulu info.
If I used your example, doesn't break out all event per date. It does for some, but then once it meets that event with the Tabs, blanks, and new lines, it groups all those date event and its data into a single event.
What are the current props.conf settings for that sourcetype?
I've modified the file to remove tabs, blank lines, and new lines, only then I'm able to keep the data together, in its event. So maybe, I'll need to inquire if the client can modify the file prior to the forwarder's designated folder to clean the file up.
Unless you have another solution.
I'm chatting with the customer. Since it does keep the event separate with your suggestion + the lookahead value. We can at least get the data to stay together then we can work on extraction and correcting the format once imported.
I'll call this a success 🙂