Hi Team,
Need help with regex for LINE_BREAKER attribute in props.conf.
I have below log pattern delimited by | , however looks like this is one big event type which does not have newline or carriage return characters so i am not able to use ([\r\n]+) nor ^\w| patterns.
Our architecture needs it and i wont be using out of box psv sourcetype.
Can someone help me provide the right LINE_BREAKER pattern to be used.
Sample log
hostname|cluster_name|11/26/17 00:43:19|AB- 1|INFO| Retail.getCategoryListCodesFromProperties() retail Code List to show the link ::[02756, 2127]
hostname|cluster_name|11/26/17 00:49:28|AB-No Memory|object|||||||123467|123123123|01
hostname|cluster_name|11/26/17 00:51:42|AB-No Memory|object|||||||123455|123123123|00
hostname|cluster_name|11/26/17 01:04:28|AB-No Memory|object|||||||111111|123123123|01
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Header><wsse:Security xmlns:wsse="http://docs.basis-open.org/wss/2004/01/basis-2011-wss-wss-secext-1.0.xsd"><wsse:UsernameToken xmlns:wsse="http://docs.basis-open.org/wss/2004/01/basis-2011-wss-wss-secext-1.0.xsd" xmlns:wsu="http://docs.basis-open.org/wss/2004/01/basis-2011-wss-wss-utility-1.0.xsd"></soapenv:Body></soapenv:Envelope>
hostname|cluster_name|11/26/17 01:06:42|AB-No Memory|object|||||||222222|123123123|00
hostname|cluster_name|11/26/17 01:19:28|AB-No Memory|object|||||||333333|123123123|01
hostname|cluster_name|11/26/17 01:21:42|AB-No Memory|object|||||||555555|123123123|10
hostname|cluster_name|11/26/17 01:34:28|AB-No Memory|object|||||||777777|123123123|11
hostname|cluster_name|11/26/17 01:36:42|AB-No Memory|object|||||||111111|123123123|10
updated soap event
hostname|cluster_name|11/26/17 23:47:17|AB-No Memory|INFO| Webservice SOAP Request
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Header><wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"></soapenv:Envelope>
This is a little ugly, but it should work. It breaks each line before the two fields prior to the timestamp field. Of course, it won't work if either hostname or cluster_name can begin with two digits, but you should be able to adjust for that.
LINE_BREAKER = ()\w+\|\w+\|\d\d\/
TIME_PREFIX = \w+\|\w+\|
TIME_FORMAT = %m/%d/%y %H:%M:%S
I am assuming that there is either whitespace or a pipe before hostname
so try this:
LINE_BREAKER = (^|\s|\|)(?:[^|]+\|){2}\d+\/\d+\/\d+\s+\d+:\d+:\d+\|
TIME_PREFIX = (?:[^|]+\|){2}
TIME_FORMAT = %m/%d/%y %H:%M:%S
Hi woodcock, thank you for the response and regex, unfortunately this did not work. You are right the soap event did have whitespace, I updated the sample log with latest event. I tried using psv sourcetype as well it is also breaking the soap event as new line event.
This is a little ugly, but it should work. It breaks each line before the two fields prior to the timestamp field. Of course, it won't work if either hostname or cluster_name can begin with two digits, but you should be able to adjust for that.
LINE_BREAKER = ()\w+\|\w+\|\d\d\/
TIME_PREFIX = \w+\|\w+\|
TIME_FORMAT = %m/%d/%y %H:%M:%S
the issue was due to usage of FIELD_DELIMETER setting, i used this in conjunction with your line_breaker regex, i removed the FEILD_DELIMETER and it worked fine. Thank you again!!
Thank you Rich, your suggestion worked for the most part, only problem am still having using your recommendation is that when the event has SOAP xml in it [ soap xml occurs in the last delimiter only] splunk is treating it as new event, so in my above sample example which has SOAP xml that event is broken as 2 events instead of one. Is there any way to get around it.
Hmm... Not sure about that since it works in regex101.com.
Yes True, I had checked the same in regex testing site your regext works just fine but when I ingest the data it breaks into new event. Only thing I could think of is yes there was a white space which my above sample missed, this is how soap event looks like in my event, there is white space after "Webservice SOAP Request" and it showsup in next line in notepad++, will that impact your regex?, updated the same in question.
hostname|cluster_name|11/26/17 23:47:17|AB-No Memory|INFO| Webservice SOAP Request
Also I for got to mention, I tried psv sourcetype as well and it is also breaking soap event as separate event 😞
I wouldn't expect the white space in the SOAP request to make a difference, but perhaps it does. Try setting SHOULD_LINEMERGE = true
.