Splunk Search

I need help with a regex for line_breaker in props.conf

newbie2tech
Communicator

Hi Team,

Need help with regex for LINE_BREAKER attribute in props.conf.

I have below log pattern delimited by | , however looks like this is one big event type which does not have newline or carriage return characters so i am not able to use ([\r\n]+) nor ^\w| patterns.

Our architecture needs it and i wont be using out of box psv sourcetype.

Can someone help me provide the right LINE_BREAKER pattern to be used.

Sample log

hostname|cluster_name|11/26/17 00:43:19|AB- 1|INFO| Retail.getCategoryListCodesFromProperties() retail Code List to show the link ::[02756, 2127]
hostname|cluster_name|11/26/17 00:49:28|AB-No Memory|object|||||||123467|123123123|01
hostname|cluster_name|11/26/17 00:51:42|AB-No Memory|object|||||||123455|123123123|00
hostname|cluster_name|11/26/17 01:04:28|AB-No Memory|object|||||||111111|123123123|01
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Header><wsse:Security xmlns:wsse="http://docs.basis-open.org/wss/2004/01/basis-2011-wss-wss-secext-1.0.xsd"><wsse:UsernameToken xmlns:wsse="http://docs.basis-open.org/wss/2004/01/basis-2011-wss-wss-secext-1.0.xsd" xmlns:wsu="http://docs.basis-open.org/wss/2004/01/basis-2011-wss-wss-utility-1.0.xsd"></soapenv:Body></soapenv:Envelope>
hostname|cluster_name|11/26/17 01:06:42|AB-No Memory|object|||||||222222|123123123|00
hostname|cluster_name|11/26/17 01:19:28|AB-No Memory|object|||||||333333|123123123|01
hostname|cluster_name|11/26/17 01:21:42|AB-No Memory|object|||||||555555|123123123|10
hostname|cluster_name|11/26/17 01:34:28|AB-No Memory|object|||||||777777|123123123|11
hostname|cluster_name|11/26/17 01:36:42|AB-No Memory|object|||||||111111|123123123|10

updated soap event

hostname|cluster_name|11/26/17 23:47:17|AB-No Memory|INFO| Webservice SOAP Request 
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Header><wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"></soapenv:Envelope>
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

This is a little ugly, but it should work. It breaks each line before the two fields prior to the timestamp field. Of course, it won't work if either hostname or cluster_name can begin with two digits, but you should be able to adjust for that.

LINE_BREAKER = ()\w+\|\w+\|\d\d\/
TIME_PREFIX = \w+\|\w+\|
TIME_FORMAT = %m/%d/%y %H:%M:%S
---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

woodcock
Esteemed Legend

I am assuming that there is either whitespace or a pipe before hostname so try this:

LINE_BREAKER = (^|\s|\|)(?:[^|]+\|){2}\d+\/\d+\/\d+\s+\d+:\d+:\d+\|
TIME_PREFIX = (?:[^|]+\|){2}
TIME_FORMAT = %m/%d/%y %H:%M:%S
0 Karma

newbie2tech
Communicator

Hi woodcock, thank you for the response and regex, unfortunately this did not work. You are right the soap event did have whitespace, I updated the sample log with latest event. I tried using psv sourcetype as well it is also breaking the soap event as new line event.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

This is a little ugly, but it should work. It breaks each line before the two fields prior to the timestamp field. Of course, it won't work if either hostname or cluster_name can begin with two digits, but you should be able to adjust for that.

LINE_BREAKER = ()\w+\|\w+\|\d\d\/
TIME_PREFIX = \w+\|\w+\|
TIME_FORMAT = %m/%d/%y %H:%M:%S
---
If this reply helps you, Karma would be appreciated.
0 Karma

newbie2tech
Communicator

the issue was due to usage of FIELD_DELIMETER setting, i used this in conjunction with your line_breaker regex, i removed the FEILD_DELIMETER and it worked fine. Thank you again!!

0 Karma

newbie2tech
Communicator

Thank you Rich, your suggestion worked for the most part, only problem am still having using your recommendation is that when the event has SOAP xml in it [ soap xml occurs in the last delimiter only] splunk is treating it as new event, so in my above sample example which has SOAP xml that event is broken as 2 events instead of one. Is there any way to get around it.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Hmm... Not sure about that since it works in regex101.com.

---
If this reply helps you, Karma would be appreciated.
0 Karma

newbie2tech
Communicator

Yes True, I had checked the same in regex testing site your regext works just fine but when I ingest the data it breaks into new event. Only thing I could think of is yes there was a white space which my above sample missed, this is how soap event looks like in my event, there is white space after "Webservice SOAP Request" and it showsup in next line in notepad++, will that impact your regex?, updated the same in question.

hostname|cluster_name|11/26/17 23:47:17|AB-No Memory|INFO| Webservice SOAP Request

0 Karma

newbie2tech
Communicator

Also I for got to mention, I tried psv sourcetype as well and it is also breaking soap event as separate event 😞

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I wouldn't expect the white space in the SOAP request to make a difference, but perhaps it does. Try setting SHOULD_LINEMERGE = true.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Notification Email Migration Announcement

The Notification Team is migrating our email service provider from Postmark to AWS Simple Email Service (SES) ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...