I posted something about this here:
My last question I don't think really addressed the issue because I was using index preview and had uploaded a stand alone, inactive log.
I have a log that is the output of a script that runs and checks some components of some software.
The log has a Check Start time and a Check End time. In between Start and End are other lines indicating when parts of the script have passed....or even failed.
Because the results are live and not every line has a timestamp, I'm running into issues indexing this log correctly. The lines of the output of the script might come in a few lines a second, wait....then wait some more, then a few more lines....then finally finish with Check End:
Splunk is taking each little "chunk" it posts and making a new event for this. I don't want that. I want splunk to post one event from Start until End. I'm not sure if you can have splunk "wait" for all the data before posting to the index?
So far I have tried the following props.conf entries on my splunk indexer, all at separate times of course. I'm not sure but do I need a combination of line breaker and must break after?
[myscript] BREAK_ONLY_BEFORE = \bCheck start: \w+ \w+ \d+ \d+:\d+:\d+ \w+ \d+\b MAX_EVENTS = 19 [myscript] LINE_BREAKER = (Check start:)(\[r\n]+)(Check end:) [myscript] LINE_BREAKER = Check start:\[r\n]+|Check end: [myscript] LINE_BREAKER=([\r\n]+)(=+[\r\n]+Check start) [myscript] LINE_BREAKER = (Check start:([rn]+)Check end:) SHOULD_LINEMERGE = false
I want the below text to show up as 1 log entry. Perhaps the timestamp it could give for the entry is when the Check starts.
=========================================== Check start: Tue Feb 5 23:28:10 EST 2013 =========================================== Accessing ATOMPUB_URL: http://BLAHBLAH:810/wackadoo/lobster with userid admin@AWESOMEHOST Grabbing heartbeat folder... Reading test file for upload... Checking if file already exists... Creating test upload... Document successfully created. Verifying that document was successfully uploaded... Verifying that rendition was successfully created... Deleting uploaded document... Deletion successful. Test successfully completed. Heartbeat check returned in 4001ms. Heartbeat check status:0 =========================================== Check end: Tue Feb 5 23:28:14 EST 2013 ===========================================
Here is an example of what Splunk is doing.
1 2/13/13 3:54:34.000 PM Deleting uploaded document... Deletion successful. Test successfully completed. Heartbeat check returned in 4001ms. Heartbeat check status:0 =========================================== Check end: Wed Feb 13 15:54:34 EST 2013 =========================================== 2 2/13/13 3:54:33.000 PM Grabbing heartbeat folder... Reading test file for upload... Checking if file already exists... Creating test upload... Document successfully created. Verifying that document was successfully uploaded... Verifying that rendition was successfully created... 3 2/13/13 3:54:30.000 PM =========================================== Check start: Wed Feb 13 15:54:30 EST 2013 =========================================== Accessing ATOMPUB_URL: http://BLAHBLAH:810/wackadoo/lobster with userid admin@awesomehost
Forward lookahead is your friend :
LINE_BREAKER = ([\r\n]+)(?=\=+[\n\r]+Check start) SHOULD_LINEMERGE = false TIME_PREFIX=Check start:
Note, there's a space at the end of the TIME_PREFIX so its "Check start: "
The line breaker is any newline (captured - and so removed), followed by a number of =s, followed by a newline followed by "Check start"
well the output of the script is in the log file in question, but it comes into the log file as the tasks are completed. It gives an idea of where the script is based on the output in the log OR if a part of the script breaks, it gives us an idea where. The log file that I'm indexing is a "summary" of all the runs of the script until it gets big enough to be rotated. The script is dumping a summary to this log every 2 minutes and runs constantly.
the problem with this is that the log entries that make up the event don't all come in at the same time....all the lines from the entry come in during a 2 minute interval as tasks are completed in the script....so I don't think splunk will ever know to group them all together as one.
Put this in props.conf on server:
LINE_BREAKER = ([\r\n]+)(?==+[\n\r]+Check start)
SHOULD_LINEMERGE = false
TIME_PREFIX = Check start:
Note: it's exactly what you posted but the slashes are not showing up! It hasn't changed the output of the log in splunk at all. I'm about to ask the person who developed the script to just have every event w/ a timestamp. No matter what I put in, nothing works it seems 😞 We're going to see about having the script wait and then dump all 19 lines at once to see if this just resolves the issue