Splunk Search

Regex to find Multi Line pattern

Neekheal
Observer

Hi,


I am having some problem to understand How to fetch multiline pattern in a single event.

I have logfile in which I am searching this pattern which is scattered in multiple lines,
123456789102BP Tank: Bat from Surface = #07789*K00C0**************************************** 00003453534534534

****after Multiple Lines***
123456789107CSVSentinfo:L00Show your passport

****after Multiple Lines***

123456789110CSVSentinfo Data:z800
****after Multiple Lines***

123456789113CSVSentinfoToCollege:

****after Multiple Lines***

123456789117CSVSentinfoFromCollege:

****after Multiple Lines***

123456789120CSVSentinfo:G7006L

****after Multiple Lines***

123456789122CSVSentinfo:A0T0

****after Multiple Lines***

123456789124BP Tank: Bat to Surface L000passportAccepted

 

I have tried below query to find all the occurrences but no luck
index=khisab_ustri  sourcetype=sosnmega  "*BP Tank: Bat from surface = *K00C0*" |dedup _time
|rex field=_raw "(?ms)(?<time_string>\d{12})BP Tank: Bat from Surface .*K00C0\d{21}(?<kmu_str>\d{2})*"
|rex field=_raw "(?<PC_sTime>\d{12})CSVSentinfo:L00Show your passport*"
|rex field=_raw "(?<CP_sTime>\d{12})CSVSentinfo Data:z800*"
|rex field=_raw "(?<MTB_sTime>\d{12})CSVSentinfoToCollege:*"
|rex field=_raw "(?<MFB_sTime>\d{12})CSVSentinfoFromCollege:*"
|rex field=_raw "(?<PR_sTime>\d{12})CSVSentinfo:G7006L*"
|rex field=_raw "(?<JR_sTime>\d{12})CSVSentinfo:A0T0*"
|rex field=_raw "(?<MR_sTime>\d{12})BP Tank: Bat to Surface =.+L000passportAccepted*"
|table (PC_sTime- time_string),(CP_sTime- PC_sTime),(MTB_sTime-CP_sTime),(MFB_sTime-MTB_sTime),(PR_sTime- MFB_sTime),(JR_sTime-PR_sTime),(MR_sTime-JR_sTime)


Sample Data is

Sample Data:
123456789102BP Tank: Bat from Surface = #07789*K00C0**************************************** 00003453534534534
123456789103UniverseToMachine\0a<Ladbrdige>\0a <SurfaceTake>GOP</Ocnce>\0a <Final_Worl-ToDO>Firewallset</KuluopToset>\0a</
123456789105SetSurFacetoMost>7</DecideTomove>\0a <TakeaKooch>&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;</SurfaceBggien>\0a <Closethe Work>0</Csloethe Work>\0a
123456789107CSVSentinfo:L00Show your passport
123456789108BP Tank: Bat from Surface = close ticket
123456789109CSVSentinfo:Guide iunit
123456789110CSVSentinfo Data:z800
123456789111CSVGErt Infro"8900
123456789112CSGFajsh:984
123456789113CSVSentinfoToCollege:
123456789114CSVSentinfo Data:z800
123456789115CSVSentinfo Data:z800
123456789116Sem startedfrom Surface\0a<Surafce have a data>\0a <Surfacecame with Data>Ladbrdige</Ocnce>\0a <Ladbrdige>Ocnce</Final_Worl>\0a <KuluopToset>15284</DecideTomove>\0a <SurafceCall>\0a <wait>\0a <wating>EventSent</SurafceCall>\0a </wait>\0a </sa>\0a</Surafce have a data>\0a\0a
123456789117CSVSentinfoFromCollege:
123456789118CSVSentinfo:sadjhjhisd
123456789119CSVSentinfo:Loshy890
123456789120CSVSentinfo:G7006L
123456789121CSVSentinfo:8shhgbve
123456789122CSVSentinfo:A0T0
123456789123CSVSentinfo Data:accepted
123456789124BP Tank: Bat to Surface L000passportAccepted


Labels (3)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

The attempted code shows several misunderstandings, otherwise the regex can be fixed.

  1. Most importantly, you need to realize that table command does not perform evaluation.  It can only tabulate fields that already have value.
  2. Second, there are several obvious attempts to use asterisk (*) as wildcard in regex.  It is not.  In regex, * is a repetition token.  What you meant is perhaps .*.  So I made changes as such.

Beside these, the first line in the sample also cannot match \d{21}\d2 because you used nonnumeric characters immediately after BP Tank: Bat from Surface = #07789*K00C0.  To make the following meaningful, I replaced those characters with numerals in the emulation.  What you should be using is perhaps something like

 

index=khisab_ustri  sourcetype=sosnmega  "*BP Tank: Bat from surface = *K00C0*"
|rex max_match=0 "(?ms)(?<time_string>\d{12})BP Tank: Bat from Surface .*K00C0\d{21}(?<kmu_str>\d{2})*"
|rex max_match=0 "(?<PC_sTime>\d{12})CSVSentinfo:L00Show your passport.*"
|rex max_match=0 "(?<CP_sTime>\d{12})CSVSentinfo Data:z800.*"
|rex max_match=0 "(?<MTB_sTime>\d{12})CSVSentinfoToCollege:.*"
|rex max_match=0 "(?<MFB_sTime>\d{12})CSVSentinfoFromCollege:.*"
|rex max_match=0 "(?<PR_sTime>\d{12})CSVSentinfo:G7006L.*"
|rex max_match=0 "(?<JR_sTime>\d{12})CSVSentinfo:A0T0.*"
|rex max_match=0 "(?<MR_sTime>\d{12})BP Tank: Bat to Surface .*L000passportAccepted.*"
| eval PC_minus_timestring = (PC_sTime- time_string),
  CP_minus_PC = mvmap(CP_sTime, (CP_sTime- PC_sTime)),
  MTB_minus_CP = (MTB_sTime-CP_sTime),
  MFB_minus_MTB = (MFB_sTime-MTB_sTime),
  PR_minus_MFB = (PR_sTime- MFB_sTime),
  JR_minus_PR = (JR_sTime-PR_sTime),
  MR_minus_JR = (MR_sTime-JR_sTime)
| table *_minus_*

 

 

The modified sample data will give

CP_minus_PC
JR_minus_PRMFB_minus_MTBMR_minus_JRPC_minus_timestringPR_minus_MFB
3
7
8
24253

Some additional pointers

  • You should not use dedup on _time.  If you need to do that, something is wrong with your event data.  Fix that first.
  • rex command operates on _raw by default.  No need to specify.
  • Some fields can have multiple matches.  I added max_match=0.  Read rex document about its options.
  • Your sample data do not contain all fields you are trying to extract.
  • Your sample SPL does not does not use kmu_str field that is extracted.

Here is an emulation of modified sample data.  Play with it and compare with real data

 

| makeresults
| eval _raw = "123456789102BP Tank: Bat from Surface = #07789*K00C012345678901234567890178 00003453534534534
123456789103UniverseToMachine\\0a<Ladbrdige>\\0a <SurfaceTake>GOP</Ocnce>\\0a <Final_Worl-ToDO>Firewallset</KuluopToset>\\0a</
123456789105SetSurFacetoMost>7</DecideTomove>\\0a <TakeaKooch>&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;&#32;</SurfaceBggien>\\0a <Closethe Work>0</Csloethe Work>\\0a
123456789107CSVSentinfo:L00Show your passport
123456789108BP Tank: Bat from Surface = close ticket
123456789109CSVSentinfo:Guide iunit
123456789110CSVSentinfo Data:z800
123456789111CSVGErt Infro\"8900
123456789112CSGFajsh:984
123456789113CSVSentinfoToCollege:
123456789114CSVSentinfo Data:z800
123456789115CSVSentinfo Data:z800
123456789116Sem startedfrom Surface\\0a<Surafce have a data>\\0a <Surfacecame with Data>Ladbrdige</Ocnce>\\0a <Ladbrdige>Ocnce</Final_Worl>\\0a <KuluopToset>15284</DecideTomove>\\0a <SurafceCall>\\0a <wait>\\0a <wating>EventSent</SurafceCall>\\0a </wait>\\0a </sa>\\0a</Surafce have a data>\\0a\\0a
123456789117CSVSentinfoFromCollege:
123456789118CSVSentinfo:sadjhjhisd
123456789119CSVSentinfo:Loshy890
123456789120CSVSentinfo:G7006L
123456789121CSVSentinfo:8shhgbve
123456789122CSVSentinfo:A0T0
123456789123CSVSentinfo Data:accepted
123456789124BP Tank: Bat to Surface L000passportAccepted"
``` the above emulates
index=khisab_ustri  sourcetype=sosnmega  "*BP Tank: Bat from surface = *K00C0*"
```

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

+1 on @ITWhisperer 's question. Is this all just a huge chunk of data ingested as a single event and containing in fact multiple separate intertwined "streams" of data or are those separate events?

0 Karma

Neekheal
Observer

Yes, different events.

I am very initial stage of SPL hence trying to figure it out.
TIA

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Just to be clear, are you saying that your sample data (as shown) has been ingested as a single event and that there are other lines in the event which are unrelated or at least you want to ignore?

0 Karma

Neekheal
Observer

Yes, they are multiple events.

0 Karma

Neekheal
Observer

What should be the rex command to skip new lines ,characters or numbers and special characters and then to search and extract 

 "(?<PC_sTime>\d{12})CSVSentinfo:L00Show your passport*"

 

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Hi @Neekheal 

If the text is literal and same for all logs, then you can include the direct lines inside the rex. 

Lets say "CSVSentinfo:L00Show your passport" is a "constant" in all logs, then you keep it as part of rex command:

 "(?<PC_sTime>\d{12})CSVSentinfo\:L00Show your passport.*(?P<Field2>rex cmd)"

to match newline and/or tab characters, pls include "\n" "\t"

 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

inventsekar
SplunkTrust
SplunkTrust

Hi @Neekheal all the rex commands should be a written as a single rex command. 
i mean, after first rex command, pls write rex try to match the extra characters and then write the 2nd rex command and then write rex command to match the extra characters, etc.. 

index=khisab_ustri  sourcetype=sosnmega  "*BP Tank: Bat from surface = *K00C0*" |dedup _time
|rex field=_raw "(?ms)(?<time_string>\d{12})BP Tank: Bat from Surface .*K00C0\d{21}(?<kmu_str>\d{2})*"
|rex field=_raw "(?<PC_sTime>\d{12})CSVSentinfo:L00Show your passport*"

to 

index=khisab_ustri  sourcetype=sosnmega  "*BP Tank: Bat from surface = *K00C0*" |dedup _time
|rex field=_raw "(?ms)(?<time_string>\d{12})BP Tank: Bat from Surface .*K00C0\d{21}(?<kmu_str>\d{2}) <<< some rex commands to match >>>  "(?<PC_sTime>\d{12})CSVSentinfo:L00Show your passport*"

 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...