Splunk Search

Extract multivalue fields from multiline events


Hi All,
I'm trying to parse multiline structured tabular events like this:

CPU              Schedule           Job                                    State Pr Start  Elapse  Dependencies  Return Code
APP10CNHL       #CF14330AAAAAAAAG *LIM  5*                                 STUCK 10 11/27         [NHIPPK600S90    ,11/26/14]
                                         SPK6009001                               SUCC  10 11/27  00:01  #J10175                0
                                         SPK60090CP                               ABEND 10 11/27  00:01  #J10416                5
                                         IPK6009003                               HOLD  10(11/26)        SPK60090CP
                                         IPK6009004                               HOLD  10(11/26)        SPK60090CP
                                         IPK6009005                               HOLD  10(11/26)        SPK60090CP
                                         IPK6009006                               HOLD  10(11/26)        SPK60090CP
                                         FPK60090ZZ                               HOLD  10(11/26)        IPK6009003; IPK6009004; IPK6009005
    APP10CNHL       #NHIPPK605GCL     *LIM  4*                                 SUCC  10 15:00  00:19  [Carry]
                                     GPK605_RMDIR                             SUCC  10 15:00  00:01  #J32208                0
                                     GPK605_RMDIR_GC1                         SUCC  10 15:00  00:01  #J32210                0
                          (IVPRPTEC#)GPK605_STOP_LSNR_U                       SUCC  10 15:00  00:01  #J17236184             0
                          (IVPRPTEC#)GPK605_STOP_LSNR                         SUCC  10 15:00  00:01  #J39714998             0
                          (IVPRPTEC#)GPK6_SHOW_SESSION                        SUCC  10 15:00  00:01  #J57409632             0
                          (IVPRPTEC#)GPK605_KILL_SESSION                      SUCC  10 15:00  00:01  #J57409644             0
                          (IVPRPTEC#)SLEEP_60                                 SUCC  10 15:01  00:02  #J57409672             0
                                     GPK605_MAIL_OFF                          SUCC  10 15:02  00:01  #J1133                 0
                          (IVPRPTEC#)GPK605_START_LSNR                        SUCC  10 15:02  00:01  #J39714862             0
                          (IVPRPTEC#)SLEEP_300                                SUCC  10 15:02  00:06  #J39714878             0
                          (IVPRPTEC#)GPK6BK0001_EXPORT                        SUCC  10 15:07  00:11  #J14352512             0
                                     GPK605_MKDIR                             SUCC  10 15:18  00:01  #J30532                0
                                     FINERETE                                 SUCC  10 15:18  00:01  #J1647  

Each event starts with a ScheduleID line, then followed by one or more JOBs lines.

Event braking works fine, as well field extraction for the first line. But my problem is to extract fields from the following JOB lines, with a sort of recurring regex.

Here's field extraction for ScheduleID first line:

EXTRACT-batch-schedule-header = (?mi)^(?<CPU>\w+)\s+(?<SCHEDULE>#\w+)\s+\*\w+\s+\d\*\s+(?<STATE>\w+)\s+(?<PR>\d+)(\s|\()(?<START>\d\d\S\d\d)?.*(\[(?<RETURN>.*)\])?

I was then trying to extract multi-value event "job-line" with this rex:

sourcetype=batch_prp  |rex "(?m)^((\s+(?<Line>.*)$))+"

but this extracts only the last occurrence of Line in each event.

Any idea on how to write the regex to extract all the different value of the "Line" field?

Thanks a lot!!!

Marco Scala

0 Karma

Hello Marco,

I have indexed your file email-prp-xxxx-batch .txt and i have tried to write some regex to extract the fields. I still have some small lack of understanding about the data. But for the moment i this can help you.

NOTE: to run this on your server , make sure you put an appropriate sourcetype. mar is the one a have created during data input and marco is the index in wich i have place the data.

index=marco sourcetype="mar" | head 10000 | rex "(?im)^[^\\-\\n]*\\-(?P<CPU>[^#]+)" | rex "(?i)9CNHL       (?P<SCHEDULE>[^ ]+)" 
| rex "(?i)(?P<SCHEDULE>[^ ]+)\\s+\\*\\w+\\s+\\d+\\*\\s+\\w+\\s+\\d+\\s+\\d+:\\d+\\s+\\d+:\\d+\\s+\\[\\w+\\]" |rex "(?im)^\\s+(?P<JOB>[^ ]+)"| rex "(?im)^\\s+\\w+\\s+(?P<STATE>[^ ]+)" | rex "(?i) SUCC  (?P<PR>[^ ]+)" | rex "(?i)^\\s+[a-z_-]+\\w+\\s+\\w+\\s+\\d+\\s+(?P<START>[^ ]+)"| rex "(?i)^[^/]*/\\d+\\s+\\d+:\\d+\\s+(?P<DEPENDENCIES>[^ ]+)"| rex "(?i)^[^/]*/\\d+\\s+(?P<ELAPSE>[^ ]+)"  |table SCHEDULE JOB STATE PR START ELAPSE DEPENDENCIES
0 Karma


Hi Stephane and thanks for your try. You basically split the rex in several rex to capture any single piece of the data, but there is still the same problem: in your search you just captured the first JOB line. Instead I have to capture ALL the JOBs lines under the first header line containing CPU, SCHEDULE and the other fields. If you try for instance with Schedule CF14330AAAAAAAAF (the first of the file), you'll extract only the first JOB line out of 20.


0 Karma

New Member

Hi Marcoscala, have you managed to solve this problem. Am facing the same kind problem as you are.

0 Karma


PS: if anybody wants to play with this file, pls let me know and I'll email it!


0 Karma

Hi , i think i can help you just send me the file to work on
at cyrilleko@gmail.com

0 Karma