Getting Data In

Line Breaking: Regex not recognized, not breaking using my defined regex

shariinPH
Contributor

Hi All,

I have here log sample which i need to break
I already tried LINE_BREAKER and BREAK_ONLY_BEFORE

LINE_BREAKER=\w+\d+\|\w+_\w+_\w+\s+\d+/\d+/\d+\|\d+\|\d+\|\d+\|\d+\|\w+\s+-------------------------------------------------------------------------------- 

AND

BREAK_ONLY_BEFORE\w+\d+\|\w+_\w+_\w+\s+\d+/\d+/\d+\|\d+\|\d+\|\d+\|\d+\|\w+\s+--------------------------------------------------------------------------------

My event should break before (for example)

SMSMSMSM|REALITY0|20150325|060128|20150325|061116|Completed

--------------------------------------------------------------------------------

but the regex is not working.
refer to the attachement for mysamplelog ..

0 Karma
1 Solution

jeffland
SplunkTrust
SplunkTrust

Before you can linebreak something, you need to know exactly where and when you want a linebreak. If the first thing on a new event is not consistently the same thing, you need to work out a way to still identify those elements reliably. I'm assuming that there is an infinite number of possible "words" at the beginning of a new event, so the only thing we can do is rely on the pattern that happens before the 80 - characters (given that they are always there in that number). Here is my go at that, see if it does what you want at https://regex101.com/ (you can paste the regex and your log there and see it live in action, probably better than trying it out with you props.conf right away)

([\r\n]+)\w+\|.*\|\d*\|\d*\|\d*\|\d*\|\w+\n\-{80}

What this does is basically look for a linebreak followed by a word, then optionally anything between some pipes, ended by a word, a newline and 80 - characters. What helped me a lot was this blog post: http://blogs.splunk.com/2014/04/23/its-that-time-again/

Hope this is in the right direction. I do not know how the long part of

PA_COMP|P|MWSI|
PA_CYCLE|P|201503|
PA_LAUFI|P|CV0322|
PA_PORTN|P|08_BG22|
PA_STATR|P||
SO_IDID|S|127137382|
SO_IDID|S|127137384|
SO_IDID|S|127137386|
... 

at the beginning is supposed to be indexed, right now it belongs to the event above it.

View solution in original post

0 Karma

jeffland
SplunkTrust
SplunkTrust

Before you can linebreak something, you need to know exactly where and when you want a linebreak. If the first thing on a new event is not consistently the same thing, you need to work out a way to still identify those elements reliably. I'm assuming that there is an infinite number of possible "words" at the beginning of a new event, so the only thing we can do is rely on the pattern that happens before the 80 - characters (given that they are always there in that number). Here is my go at that, see if it does what you want at https://regex101.com/ (you can paste the regex and your log there and see it live in action, probably better than trying it out with you props.conf right away)

([\r\n]+)\w+\|.*\|\d*\|\d*\|\d*\|\d*\|\w+\n\-{80}

What this does is basically look for a linebreak followed by a word, then optionally anything between some pipes, ended by a word, a newline and 80 - characters. What helped me a lot was this blog post: http://blogs.splunk.com/2014/04/23/its-that-time-again/

Hope this is in the right direction. I do not know how the long part of

PA_COMP|P|MWSI|
PA_CYCLE|P|201503|
PA_LAUFI|P|CV0322|
PA_PORTN|P|08_BG22|
PA_STATR|P||
SO_IDID|S|127137382|
SO_IDID|S|127137384|
SO_IDID|S|127137386|
... 

at the beginning is supposed to be indexed, right now it belongs to the event above it.

0 Karma

shariinPH
Contributor

hi @jeffland will check on this. tell you what will happen. Thanks 😄

0 Karma

shariinPH
Contributor

Hi @jeffland it still not working. have you tried to indexed the log file i provided?

0 Karma

jeffland
SplunkTrust
SplunkTrust

Yeah, it works fine for me. Although I have to say, your timestamps are a mess.
But I have found something even prettier:

(\-{80}[\r\n]+)

This makes all those - disappear as well. If this does not work for you, then I suspect there is something wrong with the way you're trying to apply the settings. Did you define a new custom sourcetype?

0 Karma

shariinPH
Contributor

hello @jeffland, im trying to custom my sourctype upon indexing the log file. i wonder why it doesn't work on me ..

0 Karma

shariinPH
Contributor

@jeffland would you mind if i ask you to post here your props.conf for the sourcetype you used? that would help me a lot to understand what you did with the line break.

0 Karma

jeffland
SplunkTrust
SplunkTrust

In /etc/system/local/props.conf, I have

[temp_dummy_line]
LINE_BREAKER = (\-{80}[\r\n]+)
SHOULD_LINEMERGE = false
category = Custom
disabled = false
pulldown_type = true

When I import your logfile, I select Custom -> temp_dummy_line from the sourcetype menu, and this gives me these very nice events:
http://postimg.org/image/u1h31evzj/
I don't know how your timestamps work, but I even tried to add the following two lines to the same props.conf stanza:

DATETIME_CONFIG = /etc/temp_linebreak.xml
MAX_TIMESTAMP_LOOKAHEAD = 0

And in the temp_linebreak.xml, I put

<datetime>
    <define name="time" extract="hour, minute, second">
        <text><![CDATA[20\d{6}\|(\d{2})(\d{2})(\d{2})]]></text>
    </define>
    <define name="date" extract="year, month, day">
        <text><![CDATA[20(\d{2})(\d{2})(\d{2})]]></text>
    </define>
    <timePatterns>
        <use name="time"/>
    </timePatterns>
    <datePatterns>
        <use name="date"/>
    </datePatterns>
</datetime>

This may be the wrong interpretation of your timestamps, but at least every event has a timestamp now.

shariinPH
Contributor

@jeffland i would try this. and by the way thank you for the effort on how would the timestamp work . i will get back to you in a while, i'll try this.

0 Karma

shariinPH
Contributor

hello @jeffland .. it work but there is some misunderstanding between us..
what you meant is this http://postimg.org/image/fx32ptft5/
what you did is you break event every after the long dashes ---...---

but what i want to be my event is this http://postimg.org/image/6ssmefynl/
i enclosed in a red rectangle shape the event i want to have .

please bear with me ..
thank you very very much

0 Karma

jeffland
SplunkTrust
SplunkTrust

You're welcome. Any help I can give is training for me.
Ah, so the parts with many lines of PA_NOTIF_... and ABCDEFG_... belong to the event before that. Does this also apply to the first event in your log, i.e. does the long part of PA_COMP... belong to GARETTE...? And what about the first EMEM1... which is not divided from the first PA_COMP... by 80 - characters, does it not belong to the long part of PA_COMP... as well but is indeed also a new event? If the answer is yes to all those questions, then this is your regex:

([\r\n]+)(?:[^|]*\|){6}\w*\n\-{80}

This looks for a linebreak (which will mark your new event), six instances of | with something (or nothing) between them followed by a word (which so far is "Completed" in your data), a newline and 80 - characters.

Hope this is it 🙂

shariinPH
Contributor

hello @jeffland will definitely try this one 🙂

0 Karma

shariinPH
Contributor

hello again @jeffland .. I used the line breaker you provided. and what i get is this http://postimg.org/image/ip8wbeti1/

for my props.conf:

[jm_dummy]
LINE_BREAKER = ([\r\n]+)(?:[^|]*\|){6}\w*\n\-{80}
SHOULD_LINEMERGE = false
category = Custom
disabled = false
pulldown_type = true

did you got the same output?

0 Karma

shariinPH
Contributor

THIS WORKS @jeffland!! 🙂 Amazing! what i used is the regex (?:[^|]*\|){6}\w* and here's what i got http://postimg.org/image/j4y1q82fr/full/

shariinPH
Contributor

Thank you very much . You've been so helpful @jeffland

0 Karma

jeffland
SplunkTrust
SplunkTrust

Very good, glad I could help.

0 Karma

jeffland
SplunkTrust
SplunkTrust

I haven't fully understood where in that file you want linebreaks. Exactly before the date inside a line? On the many ---? You should try your regular expressions at https://regex101.com/, they have a nice visualization. Your code for example has unescaped delimiters.

0 Karma

shariinPH
Contributor

hi @jeffland here's a sample
SMSMSMSM|REALITY0|20150325|061528|20150325|062347|Completed
--------------------------------------------------------------------------------
ABCDEFG|S|03000036|
ABCDEFG|S|03000040|
ABCDEFG|S|03000073|
ABCDEFG|S|03000076|
ABCDEFG|S|03000080|
ABCDEFG|S|03000081|
ABCDEFG|S|03000091|
ABCDEFG|S|03000092|
ABCDEFG|S|03000093|
ABCDEFG|S|03000095|
ABCDEFG|S|03000097|
ABCDEFG|S|03000103|
ABCDEFG|S|03000104|
ABCDEFG|S|03000146|
ABCDEFG|S|03000160|
ABCDEFG|S|03000176|
ABLESGR|P|01|
ANLAGE|S||
BEGABL|S|03/01/2015|03/29/2015
COUNTREQ|P| 0|
EXTNR|P||
GEPLAART|P|01|
GPLARTTS|P||
IGNPREP|P|X|
KARPRFG|P|X|
MASSAKT|P||
SMSMSMSM|REALITY0|20150325|061628|20150325|062401|Completed
--------------------------------------------------------------------------------
ABCDEFG|S|03000211|
ABCDEFG|S|03000212|
ABCDEFG|S|03000215|
ABCDEFG|S|03000219|
ABCDEFG|S|03000220|
ABCDEFG|S|03000245|
ABCDEFG|S|03000256|
ABCDEFG|S|03000258|
ABCDEFG|S|03000283|
ABCDEFG|S|03000325|
ABCDEFG|S|03000360|
ABCDEFG|S|03000362|
ABCDEFG|S|03000370|
ABCDEFG|S|03000371|
ABCDEFG|S|03000600|
ABCDEFG|S|03000620|
ABLESGR|P|01|
ANLAGE|S||
BEGABL|S|03/01/2015|03/29/2015
COUNTREQ|P| 0|
EXTNR|P||
GEPLAART|P|01|
GPLARTTS|P||
IGNPREP|P|X|
KARPRFG|P|X|
MASSAKT|P||

0 Karma

shariinPH
Contributor

and I want my event to break as like this

SMSMSMSM|REALITY0|20150325|061628|20150325|062401|Completed

ABCDEFG|S|03000211|
ABCDEFG|S|03000212|
ABCDEFG|S|03000215|
ABCDEFG|S|03000219|
ABCDEFG|S|03000220|
ABCDEFG|S|03000245|
ABCDEFG|S|03000256|
ABCDEFG|S|03000258|
ABCDEFG|S|03000283|
ABCDEFG|S|03000325|
ABCDEFG|S|03000360|
ABCDEFG|S|03000362|
ABCDEFG|S|03000370|
ABCDEFG|S|03000371|
ABCDEFG|S|03000600|
ABCDEFG|S|03000620|
ABLESGR|P|01|
ANLAGE|S||
BEGABL|S|03/01/2015|03/29/2015
COUNTREQ|P| 0|
EXTNR|P||
GEPLAART|P|01|
GPLARTTS|P||
IGNPREP|P|X|
KARPRFG|P|X|
MASSAKT|P||

0 Karma

jeffland
SplunkTrust
SplunkTrust

I'm sorry, that didn't make it much clearer. We need to find something that identifies a breakpoint. Is it only on lines like SMSMSMSM|...|Completed
-----...---
? Or is it also on EMEM1|...|Completed
-----...---
?

0 Karma

shariinPH
Contributor

im sorry for that @jeffland. actually the breakpoint should be only before
\w+(any word)|.....|\w+(any word also)\s+ -----------...------------
so eveytime splunk sees this \w+|...|\w+\s+ -----------...------------ it will break the events. i hope i make it more clearer now. please do help me.. i need this to be done 😞

0 Karma
Get Updates on the Splunk Community!

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...

Splunkbase | Splunk Dashboard Examples App for SimpleXML End of Life

The Splunk Dashboard Examples App for SimpleXML will reach end of support on Dec 19, 2024, after which no new ...

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...