Getting Data In

Line Breaking: Regex not recognized, not breaking using my defined regex

shariinPH
Contributor

Hi All,

I have here log sample which i need to break
I already tried LINE_BREAKER and BREAK_ONLY_BEFORE

LINE_BREAKER=\w+\d+\|\w+_\w+_\w+\s+\d+/\d+/\d+\|\d+\|\d+\|\d+\|\d+\|\w+\s+-------------------------------------------------------------------------------- 

AND

BREAK_ONLY_BEFORE\w+\d+\|\w+_\w+_\w+\s+\d+/\d+/\d+\|\d+\|\d+\|\d+\|\d+\|\w+\s+--------------------------------------------------------------------------------

My event should break before (for example)

SMSMSMSM|REALITY0|20150325|060128|20150325|061116|Completed

--------------------------------------------------------------------------------

but the regex is not working.
refer to the attachement for mysamplelog ..

0 Karma
1 Solution

jeffland
SplunkTrust
SplunkTrust

Before you can linebreak something, you need to know exactly where and when you want a linebreak. If the first thing on a new event is not consistently the same thing, you need to work out a way to still identify those elements reliably. I'm assuming that there is an infinite number of possible "words" at the beginning of a new event, so the only thing we can do is rely on the pattern that happens before the 80 - characters (given that they are always there in that number). Here is my go at that, see if it does what you want at https://regex101.com/ (you can paste the regex and your log there and see it live in action, probably better than trying it out with you props.conf right away)

([\r\n]+)\w+\|.*\|\d*\|\d*\|\d*\|\d*\|\w+\n\-{80}

What this does is basically look for a linebreak followed by a word, then optionally anything between some pipes, ended by a word, a newline and 80 - characters. What helped me a lot was this blog post: http://blogs.splunk.com/2014/04/23/its-that-time-again/

Hope this is in the right direction. I do not know how the long part of

PA_COMP|P|MWSI|
PA_CYCLE|P|201503|
PA_LAUFI|P|CV0322|
PA_PORTN|P|08_BG22|
PA_STATR|P||
SO_IDID|S|127137382|
SO_IDID|S|127137384|
SO_IDID|S|127137386|
... 

at the beginning is supposed to be indexed, right now it belongs to the event above it.

View solution in original post

0 Karma

jeffland
SplunkTrust
SplunkTrust

Before you can linebreak something, you need to know exactly where and when you want a linebreak. If the first thing on a new event is not consistently the same thing, you need to work out a way to still identify those elements reliably. I'm assuming that there is an infinite number of possible "words" at the beginning of a new event, so the only thing we can do is rely on the pattern that happens before the 80 - characters (given that they are always there in that number). Here is my go at that, see if it does what you want at https://regex101.com/ (you can paste the regex and your log there and see it live in action, probably better than trying it out with you props.conf right away)

([\r\n]+)\w+\|.*\|\d*\|\d*\|\d*\|\d*\|\w+\n\-{80}

What this does is basically look for a linebreak followed by a word, then optionally anything between some pipes, ended by a word, a newline and 80 - characters. What helped me a lot was this blog post: http://blogs.splunk.com/2014/04/23/its-that-time-again/

Hope this is in the right direction. I do not know how the long part of

PA_COMP|P|MWSI|
PA_CYCLE|P|201503|
PA_LAUFI|P|CV0322|
PA_PORTN|P|08_BG22|
PA_STATR|P||
SO_IDID|S|127137382|
SO_IDID|S|127137384|
SO_IDID|S|127137386|
... 

at the beginning is supposed to be indexed, right now it belongs to the event above it.

0 Karma

shariinPH
Contributor

hi @jeffland will check on this. tell you what will happen. Thanks 😄

0 Karma

shariinPH
Contributor

Hi @jeffland it still not working. have you tried to indexed the log file i provided?

0 Karma

jeffland
SplunkTrust
SplunkTrust

Yeah, it works fine for me. Although I have to say, your timestamps are a mess.
But I have found something even prettier:

(\-{80}[\r\n]+)

This makes all those - disappear as well. If this does not work for you, then I suspect there is something wrong with the way you're trying to apply the settings. Did you define a new custom sourcetype?

0 Karma

shariinPH
Contributor

hello @jeffland, im trying to custom my sourctype upon indexing the log file. i wonder why it doesn't work on me ..

0 Karma

shariinPH
Contributor

@jeffland would you mind if i ask you to post here your props.conf for the sourcetype you used? that would help me a lot to understand what you did with the line break.

0 Karma

jeffland
SplunkTrust
SplunkTrust

In /etc/system/local/props.conf, I have

[temp_dummy_line]
LINE_BREAKER = (\-{80}[\r\n]+)
SHOULD_LINEMERGE = false
category = Custom
disabled = false
pulldown_type = true

When I import your logfile, I select Custom -> temp_dummy_line from the sourcetype menu, and this gives me these very nice events:
http://postimg.org/image/u1h31evzj/
I don't know how your timestamps work, but I even tried to add the following two lines to the same props.conf stanza:

DATETIME_CONFIG = /etc/temp_linebreak.xml
MAX_TIMESTAMP_LOOKAHEAD = 0

And in the temp_linebreak.xml, I put

<datetime>
    <define name="time" extract="hour, minute, second">
        <text><![CDATA[20\d{6}\|(\d{2})(\d{2})(\d{2})]]></text>
    </define>
    <define name="date" extract="year, month, day">
        <text><![CDATA[20(\d{2})(\d{2})(\d{2})]]></text>
    </define>
    <timePatterns>
        <use name="time"/>
    </timePatterns>
    <datePatterns>
        <use name="date"/>
    </datePatterns>
</datetime>

This may be the wrong interpretation of your timestamps, but at least every event has a timestamp now.

shariinPH
Contributor

@jeffland i would try this. and by the way thank you for the effort on how would the timestamp work . i will get back to you in a while, i'll try this.

0 Karma

shariinPH
Contributor

hello @jeffland .. it work but there is some misunderstanding between us..
what you meant is this http://postimg.org/image/fx32ptft5/
what you did is you break event every after the long dashes ---...---

but what i want to be my event is this http://postimg.org/image/6ssmefynl/
i enclosed in a red rectangle shape the event i want to have .

please bear with me ..
thank you very very much

0 Karma

jeffland
SplunkTrust
SplunkTrust

You're welcome. Any help I can give is training for me.
Ah, so the parts with many lines of PA_NOTIF_... and ABCDEFG_... belong to the event before that. Does this also apply to the first event in your log, i.e. does the long part of PA_COMP... belong to GARETTE...? And what about the first EMEM1... which is not divided from the first PA_COMP... by 80 - characters, does it not belong to the long part of PA_COMP... as well but is indeed also a new event? If the answer is yes to all those questions, then this is your regex:

([\r\n]+)(?:[^|]*\|){6}\w*\n\-{80}

This looks for a linebreak (which will mark your new event), six instances of | with something (or nothing) between them followed by a word (which so far is "Completed" in your data), a newline and 80 - characters.

Hope this is it 🙂

shariinPH
Contributor

hello @jeffland will definitely try this one 🙂

0 Karma

shariinPH
Contributor

hello again @jeffland .. I used the line breaker you provided. and what i get is this http://postimg.org/image/ip8wbeti1/

for my props.conf:

[jm_dummy]
LINE_BREAKER = ([\r\n]+)(?:[^|]*\|){6}\w*\n\-{80}
SHOULD_LINEMERGE = false
category = Custom
disabled = false
pulldown_type = true

did you got the same output?

0 Karma

shariinPH
Contributor

THIS WORKS @jeffland!! 🙂 Amazing! what i used is the regex (?:[^|]*\|){6}\w* and here's what i got http://postimg.org/image/j4y1q82fr/full/

shariinPH
Contributor

Thank you very much . You've been so helpful @jeffland

0 Karma

jeffland
SplunkTrust
SplunkTrust

Very good, glad I could help.

0 Karma

jeffland
SplunkTrust
SplunkTrust

I haven't fully understood where in that file you want linebreaks. Exactly before the date inside a line? On the many ---? You should try your regular expressions at https://regex101.com/, they have a nice visualization. Your code for example has unescaped delimiters.

0 Karma

shariinPH
Contributor

hi @jeffland here's a sample
SMSMSMSM|REALITY0|20150325|061528|20150325|062347|Completed
--------------------------------------------------------------------------------
ABCDEFG|S|03000036|
ABCDEFG|S|03000040|
ABCDEFG|S|03000073|
ABCDEFG|S|03000076|
ABCDEFG|S|03000080|
ABCDEFG|S|03000081|
ABCDEFG|S|03000091|
ABCDEFG|S|03000092|
ABCDEFG|S|03000093|
ABCDEFG|S|03000095|
ABCDEFG|S|03000097|
ABCDEFG|S|03000103|
ABCDEFG|S|03000104|
ABCDEFG|S|03000146|
ABCDEFG|S|03000160|
ABCDEFG|S|03000176|
ABLESGR|P|01|
ANLAGE|S||
BEGABL|S|03/01/2015|03/29/2015
COUNTREQ|P| 0|
EXTNR|P||
GEPLAART|P|01|
GPLARTTS|P||
IGNPREP|P|X|
KARPRFG|P|X|
MASSAKT|P||
SMSMSMSM|REALITY0|20150325|061628|20150325|062401|Completed
--------------------------------------------------------------------------------
ABCDEFG|S|03000211|
ABCDEFG|S|03000212|
ABCDEFG|S|03000215|
ABCDEFG|S|03000219|
ABCDEFG|S|03000220|
ABCDEFG|S|03000245|
ABCDEFG|S|03000256|
ABCDEFG|S|03000258|
ABCDEFG|S|03000283|
ABCDEFG|S|03000325|
ABCDEFG|S|03000360|
ABCDEFG|S|03000362|
ABCDEFG|S|03000370|
ABCDEFG|S|03000371|
ABCDEFG|S|03000600|
ABCDEFG|S|03000620|
ABLESGR|P|01|
ANLAGE|S||
BEGABL|S|03/01/2015|03/29/2015
COUNTREQ|P| 0|
EXTNR|P||
GEPLAART|P|01|
GPLARTTS|P||
IGNPREP|P|X|
KARPRFG|P|X|
MASSAKT|P||

0 Karma

shariinPH
Contributor

and I want my event to break as like this

SMSMSMSM|REALITY0|20150325|061628|20150325|062401|Completed

ABCDEFG|S|03000211|
ABCDEFG|S|03000212|
ABCDEFG|S|03000215|
ABCDEFG|S|03000219|
ABCDEFG|S|03000220|
ABCDEFG|S|03000245|
ABCDEFG|S|03000256|
ABCDEFG|S|03000258|
ABCDEFG|S|03000283|
ABCDEFG|S|03000325|
ABCDEFG|S|03000360|
ABCDEFG|S|03000362|
ABCDEFG|S|03000370|
ABCDEFG|S|03000371|
ABCDEFG|S|03000600|
ABCDEFG|S|03000620|
ABLESGR|P|01|
ANLAGE|S||
BEGABL|S|03/01/2015|03/29/2015
COUNTREQ|P| 0|
EXTNR|P||
GEPLAART|P|01|
GPLARTTS|P||
IGNPREP|P|X|
KARPRFG|P|X|
MASSAKT|P||

0 Karma

jeffland
SplunkTrust
SplunkTrust

I'm sorry, that didn't make it much clearer. We need to find something that identifies a breakpoint. Is it only on lines like SMSMSMSM|...|Completed
-----...---
? Or is it also on EMEM1|...|Completed
-----...---
?

0 Karma

shariinPH
Contributor

im sorry for that @jeffland. actually the breakpoint should be only before
\w+(any word)|.....|\w+(any word also)\s+ -----------...------------
so eveytime splunk sees this \w+|...|\w+\s+ -----------...------------ it will break the events. i hope i make it more clearer now. please do help me.. i need this to be done 😞

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...