I'm trying to parse multiline structured tabular events like this:
CPU Schedule Job State Pr Start Elapse Dependencies Return Code APP10CNHL #CF14330AAAAAAAAG *LIM 5* STUCK 10 11/27 [NHIPPK600S90 ,11/26/14] SPK6009001 SUCC 10 11/27 00:01 #J10175 0 SPK60090CP ABEND 10 11/27 00:01 #J10416 5 IPK6009003 HOLD 10(11/26) SPK60090CP IPK6009004 HOLD 10(11/26) SPK60090CP IPK6009005 HOLD 10(11/26) SPK60090CP IPK6009006 HOLD 10(11/26) SPK60090CP FPK60090ZZ HOLD 10(11/26) IPK6009003; IPK6009004; IPK6009005 IPK6009006 APP10CNHL #NHIPPK605GCL *LIM 4* SUCC 10 15:00 00:19 [Carry] GPK605_RMDIR SUCC 10 15:00 00:01 #J32208 0 GPK605_RMDIR_GC1 SUCC 10 15:00 00:01 #J32210 0 (IVPRPTEC#)GPK605_STOP_LSNR_U SUCC 10 15:00 00:01 #J17236184 0 (IVPRPTEC#)GPK605_STOP_LSNR SUCC 10 15:00 00:01 #J39714998 0 (IVPRPTEC#)GPK6_SHOW_SESSION SUCC 10 15:00 00:01 #J57409632 0 (IVPRPTEC#)GPK605_KILL_SESSION SUCC 10 15:00 00:01 #J57409644 0 (IVPRPTEC#)SLEEP_60 SUCC 10 15:01 00:02 #J57409672 0 GPK605_MAIL_OFF SUCC 10 15:02 00:01 #J1133 0 (IVPRPTEC#)GPK605_START_LSNR SUCC 10 15:02 00:01 #J39714862 0 (IVPRPTEC#)SLEEP_300 SUCC 10 15:02 00:06 #J39714878 0 (IVPRPTEC#)GPK6BK0001_EXPORT SUCC 10 15:07 00:11 #J14352512 0 GPK605_MKDIR SUCC 10 15:18 00:01 #J30532 0 FINERETE SUCC 10 15:18 00:01 #J1647
Each event starts with a ScheduleID line, then followed by one or more JOBs lines.
Event braking works fine, as well field extraction for the first line. But my problem is to extract fields from the following JOB lines, with a sort of recurring regex.
Here's field extraction for ScheduleID first line:
[batch_prp] EXTRACT-batch-schedule-header = (?mi)^(?<CPU>\w+)\s+(?<SCHEDULE>#\w+)\s+\*\w+\s+\d\*\s+(?<STATE>\w+)\s+(?<PR>\d+)(\s|\()(?<START>\d\d\S\d\d)?.*(\[(?<RETURN>.*)\])?
I was then trying to extract multi-value event "job-line" with this rex:
sourcetype=batch_prp |rex "(?m)^((\s+(?<Line>.*)$))+"
but this extracts only the last occurrence of Line in each event.
Any idea on how to write the regex to extract all the different value of the "Line" field?
Thanks a lot!!!
I have indexed your file email-prp-xxxx-batch .txt and i have tried to write some regex to extract the fields. I still have some small lack of understanding about the data. But for the moment i this can help you.
NOTE: to run this on your server , make sure you put an appropriate sourcetype. mar is the one a have created during data input and marco is the index in wich i have place the data.
index=marco sourcetype="mar" | head 10000 | rex "(?im)^[^\\-\\n]*\\-(?P<CPU>[^#]+)" | rex "(?i)9CNHL (?P<SCHEDULE>[^ ]+)" | rex "(?i)(?P<SCHEDULE>[^ ]+)\\s+\\*\\w+\\s+\\d+\\*\\s+\\w+\\s+\\d+\\s+\\d+:\\d+\\s+\\d+:\\d+\\s+\\[\\w+\\]" |rex "(?im)^\\s+(?P<JOB>[^ ]+)"| rex "(?im)^\\s+\\w+\\s+(?P<STATE>[^ ]+)" | rex "(?i) SUCC (?P<PR>[^ ]+)" | rex "(?i)^\\s+[a-z_-]+\\w+\\s+\\w+\\s+\\d+\\s+(?P<START>[^ ]+)"| rex "(?i)^[^/]*/\\d+\\s+\\d+:\\d+\\s+(?P<DEPENDENCIES>[^ ]+)"| rex "(?i)^[^/]*/\\d+\\s+(?P<ELAPSE>[^ ]+)" |table SCHEDULE JOB STATE PR START ELAPSE DEPENDENCIES
Hi Stephane and thanks for your try. You basically split the rex in several rex to capture any single piece of the data, but there is still the same problem: in your search you just captured the first JOB line. Instead I have to capture ALL the JOBs lines under the first header line containing CPU, SCHEDULE and the other fields. If you try for instance with Schedule CF14330AAAAAAAAF (the first of the file), you'll extract only the first JOB line out of 20.