= Fri 12/29/17 16:10:53 BRST

danillopavan · ‎12-29-2017

Hello all,

Just would like to understand how to proceed with the filtering lines in multiline events. My events have around 30 lines and i would like to disregard several lines. My objective is to have just 4 ~ 5 lines in the multline events. Need to combinate filter events and filter lines within the multiline events. Any idea?

Thanks and regards,
Danillo Pavan

somesoni2 · ‎12-29-2017

Give this a try (sourcetype definition for your data, to be put in props.conf on Indexer/heavy forwarder/standalone server, whichever comes first in data flow)

[YourSourceType]
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=SOMEJUNK
SEDCMD-preserveJobexce=s/(\++\s+)(.+ executed.+)/= \2/
SEDCMD-removepluslines=s/(\s*\++.+)//g
SEDCMD-removejunk=s/(\=\=+[\r\n]*)//g
SEDCMD-removejunk1=s/(\=\s+\w+\s+:.+[\r\n])//g
SEDCMD-removejunk2=s/[\r\n]\s*[A-z]+.+//g
TIME_PREFIX=\=\s+\w{3}
TIME_FORMAT=%m/%d/%y %H:%M:%S %Z

View solution in original post

micahkemp · ‎12-29-2017

Assuming the lines were already indexed as a single event, this absurd search seems to filter the lines as you've asked:

| rex mode=sed "s/(= [A-Z][a-z].*)/match \1 match/g"
| rex mode=sed "s/(.*executed.*)/match \1 match/g"
| rex mode=sed "s/(^|([\n\r]+))(?!match).*(?!match)//g"
| rex mode=sed "s/(^|([\n\r]+))match (.*) match/\3\n/g"

micahkemp · ‎12-29-2017

But filtering and grouping the lines at index time (as @somesoni2's answer aims to do) is the better option. It's also more economical in terms of license usage (by far).

somesoni2 · ‎12-29-2017

Agree. ANd with your case, reducing 30+ lines to 7 odd lines would definitely be a big save. Please note that Index time setting would not alter already ingested data. Any new data after you set that up would be filtered.

danillopavan · ‎12-29-2017

Yes, I agree. I have other case where the log file has more than 80 lines and I would like to keep just 5 or 6 lines, disregarding the others. Similar case to this one, but not able to proceed with this filtering...

somesoni2 · ‎12-29-2017

Give this a try (sourcetype definition for your data, to be put in props.conf on Indexer/heavy forwarder/standalone server, whichever comes first in data flow)

[YourSourceType]
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=SOMEJUNK
SEDCMD-preserveJobexce=s/(\++\s+)(.+ executed.+)/= \2/
SEDCMD-removepluslines=s/(\s*\++.+)//g
SEDCMD-removejunk=s/(\=\=+[\r\n]*)//g
SEDCMD-removejunk1=s/(\=\s+\w+\s+:.+[\r\n])//g
SEDCMD-removejunk2=s/[\r\n]\s*[A-z]+.+//g
TIME_PREFIX=\=\s+\w{3}
TIME_FORMAT=%m/%d/%y %H:%M:%S %Z

danillopavan · ‎12-29-2017

Just checked that 3 hours after the change, something happened :). It seems that now it is showing just the correct lines ! It excluded the rest of the lines, but the transform that I have created to filter the event is not working anymore. I have created it to show just the log files related to the JOB ID 23154800, before this change it was indexing just the log files related to this JOB. The others log files were being redirected to null queue. But now, after this changed, it seems this command is not reflecting.

My props files contains:
TRANSFORMS-set= setNullJob,setParsingJob

and the in the transform files:

[setNullJob]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setParsingJob]
REGEX = 23154800
DEST_KEY = queue
FORMAT = indexQueue

Is it possible to conciliate events filtering with lines filtering?

Thanks and regards,
Danillo Pavan

danillopavan · ‎12-30-2017

Checking now, verified that I just had 4 new events indexed. All of them related to other JOB. It is strange. I believe that the event filtering is not working anymore.

Props.file

[logJobsSAP]
TRANSFORMS-set= setNullJob,setParsingJob
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=SOMEJUNK
SEDCMD-preserveJobexce=s/(++\s+)(.+ executed.+)/= \2/
SEDCMD-removepluslines=s/(\s*++.+)//g
SEDCMD-removejunk=s/(==+[\r\n])//g
SEDCMD-removejunk1=s/(=\s+\w+\s+:.+[\r\n])//g
SEDCMD-removejunk2=s/[\r\n]\s[A-z]+.+//g
TIME_PREFIX==\s+\w{3}
TIME_FORMAT=%m/%d/%y %H:%M:%S %Z

Transform file

[setNullJob]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setParsingJob]
REGEX = 23154800
DEST_KEY = queue
FORMAT = indexQueue

danillopavan · ‎12-30-2017

I was taking a look at the below topic, I found that the solution was to include the REGEX expression in transform instead of props because it seems the "SED-" entries in props file are executed prior to TRANSFORMS-
Is it correct? Could you please help me try to include these REGEX expressions in transform?

https://answers.splunk.com/answers/108452/filter-events-and-use-sedcmd.html

danillopavan · ‎12-30-2017

Not sure if is the best option considering the performance, but I have included in the transform a REGEX expression that contains words which are keeping in the event after the SED entries.

So just to resume: it is working in the below way:

First the SED entries in props file replace strings with regular expression match (the unwanted information is replaced by null string) and then the entire event is filtered using the transform file via null quee.

Many thanks for all support here!!!

Danillo Pavan

danillopavan · ‎12-29-2017

I tried but nothing was indexed. No events was indexed after this changed.

Important to mention that i am using also transform file to filter the events. Those logs are referred to some jobs, so i am using transform just to index the log files related to the job where the ID is 23154800.

My current props file is:

[XXXXXX]
TRANSFORMS-set= setNullJob,setParsingJob
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE=SOMEJUNK
SEDCMD-preserveJobexce=s/(++\s+)(.+ executed.+)/= \2/
SEDCMD-removepluslines=s/(\s*++.+)//g
SEDCMD-removejunk=s/(==+[\r\n])//g
SEDCMD-removejunk1=s/(=\s+\w+\s+:.+[\r\n])//g
SEDCMD-removejunk2=s/[\r\n]\s[A-z]+.+//g
TIME_PREFIX==\s+\w{3}
TIME_FORMAT=%m/%d/%y %H:%M:%S %Z

And the transform file is:

[setNullJob]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setParsingJob]
REGEX = 23154800
DEST_KEY = queue
FORMAT = indexQueue

somesoni2 · ‎12-29-2017

Would you mind sharing a sample event, highlighting which lines needs to be removed? My initial guess is use of SEDCMD command, on HF/Indexers.

somesoni2 · ‎12-29-2017

@danilopavan,

It seems your last comment may have some sensitive data (server names, IP Addresses). Would you mind reposting it with masting those info? I'll reject your previous comment which has sensitive data then.

danillopavan · ‎12-29-2017

Hello somesoni, sure.

Below is one example of the content of the log file that is being read. For each event, I have below multilines:

===============================================================
= JOB : R3BRP#DECOUPLE_NFE[(0006 12/29/17),(0AAAAAAAAAAIO5BA)].CL_S09_IFIPD_DECOUPLE_NFE_R3BRP_01
= USER : tws 631/S/COORD_HORTOLANDIA/
= JCLFILE : / -job IFIPD_DECOUPLE_NFE -user FF_PRO1 -i 23154800 -c a
= Job Number: 9177522

= Fri 12/29/17 16:10:53 BRST

+++ IBM Tivoli Workload Scheduler for Applications, method R3BATCH 8.5.0 (patchrev 1 - 16:42:24 Jun 13 2014)
+++ is called with following parameters:
+++ EEWO1031I The Tivoli Workload Scheduler home directory was found: ./..
+++ EEWO1027I The RFC connection is established: (1)
+++ EEWO1023I Started the R/3 job at the following date and time: 12/29-16:10 : IFIPD_DECOUPLE_NFE, 16105502
Fri Dec 29 16:10:53 2017
+++ EEWO1007I The job status has been set to EXEC: IFIPD_DECOUPLE_NFE 16105502
+++ EEWO1006I Job status: IFIPD_DECOUPLE_NFE 16105502 FINISHED
+++ EEWO1061I Job IFIPD_DECOUPLE_NFE with job ID 16105502 was executed on SAP application server XXXXXXXXXX.
+++ EEWO1048I Retrieving the joblog of a job:: IFIPD_DECOUPLE_NFE , 16105502
*** WARNING 914 *** EEWO0914W An internal error has occurred. Either the joblog or the job protocol for the following job does not exist:
Job name: IFIPD_DECOUPLE_NFE
Job ID: 16105502.
*** WARNING 904 *** EEWO0904W The program could not copy the joblog to stdout.
*** WARNING 914 *** EEWO0914W An internal error has occurred. Either the joblog or the job protocol for the following job does not exist:
Job name: IFIPD_DECOUPLE_NFE
Job ID: 16105502.
+++ EEWO1012I BDC sessions are complete at: 12/29-16:11 : 0

+++ EEWO1017I The job completed normally at the following date and time: 12/29-16:11

= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 0
= User Time (Seconds) : 0

= Fri 12/29/17 16:11:27 BRST

And I just would like to have few lines in a single event. Just the below lines:

= Job Number: 9177522
= Fri 12/29/17 16:10:53 BRST
+++ EEWO1061I Job IFIPD_DECOUPLE_NFE with job ID 16105502 was executed on SAP application server XXXXXXXXXX
= Exit Status : 0
= System Time (Seconds) : 0 Elapsed Time (Minutes) : 0
= User Time (Seconds) : 0
= Fri 12/29/17 16:11:27 BRST

micahkemp · ‎12-29-2017

Can you note what makes the +++ line that you want included different from those that you don't? I could guess, but it would only be a guess.

As for the other lines, this regex appears to match the ones that you want: ^= [A-Z][a-z].*$. I haven't yet turned that into part of a search command.

danillopavan · ‎12-29-2017

I really didnt note anything in particular that make these lines different from the others. Just want to select them because they have important information that i want to extract. All other rows are junk that I would like to delete. Initially i thought about using transform file to filter to null queue, however i believe that this mechanism is for filter entire events and not some lines within of the event.
Using regex what is your idea? to use in props file or transform file?

Thanks and regards,
Danillo Pavan

Filtering Lines in multilines events

= Fri 12/29/17 16:10:53 BRST

+++ EEWO1017I The job completed normally at the following date and time: 12/29-16:11

= Fri 12/29/17 16:11:27 BRST

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Are you a member of the Splunk Community?

Filtering Lines in multilines events

= Fri 12/29/17 16:10:53 BRST

+++ EEWO1017I The job completed normally at the following date and time: 12/29-16:11

= Fri 12/29/17 16:11:27 BRST

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...