Splunk Search

Multiple LINE_BREAKER regex

cdstealer
Contributor

Hi,
I'll cut straight to the chase. I have a sourcetype that contains 2 log sources. Both are broken correctly using the props entry

TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE_DATE = true
LINE_BREAKER = ([\r\n]+)
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype

However, one of the sources contains a lot of visible EOL terminators source_NR=NR\r\n\r\n. It is now required for the visible EOL terminators to be parsed as actual EOLs.

I've tried to apply various types of multi regex on the LINE_BREAKER to no avail. From what I've read, it is possible, but anything I try fails and breaks any line breaking.

A few things I've tried:

([\r\n]+)|([\\r\\n]+)
([\r\n]+)|\\r\\n
([\r\n]+)(\\r)(\\n)

The list goes on.

Any advice would be greatly appreciated.

Cheers
Steve

0 Karma
1 Solution

cdstealer
Contributor

This is the raw event:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015

This is how I thought the LINE_BREAKER would have changed it:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015

But I'm starting to lean towards using transforms to replace the \r\n so that the whole event is standardised?

View solution in original post

0 Karma

cdstealer
Contributor

This is the raw event:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015

This is how I thought the LINE_BREAKER would have changed it:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015

But I'm starting to lean towards using transforms to replace the \r\n so that the whole event is standardised?

0 Karma

jeffland
SplunkTrust
SplunkTrust

Hm. I'd also suggest replacing those \r\n with an actual linebreak. Have a look here and see if it works for you.

cdstealer
Contributor

nice 🙂 Thanks jeffland. very much appreciated.

0 Karma

cdstealer
Contributor

Just for completeness 🙂

My props stanza is:

[f5]
TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
BREAK_ONLY_BEFORE_DATE = True
LINE_BREAKER = ([\r\n\$])
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype
SEDCMD-newline = s/\\r\\n/,/g
SEDCMD-eventend = s/#015//g

So now all the fields are correctly extracted and the annoying #015 is removed. Plus the other source is untouched.

cdstealer
Contributor

2015-04-21T10:51:26+01:00 <> ASM: unit_hostname="<>",management_ip_address="<>",http_class_name="/Common/pl_restricted_L0_prod",web_application_name="/Common/pl_restricted_L0_prod",policy_name="/Common/pl_restricted_L0_prod",policy_apply_date="2015-04-20 21:44:42",violations="Attack signature detected",support_id="16995741371937106148",request_status="blocked",response_code="0",ip_client="46.201.133.82",route_domain="0",method="GET",protocol="HTTP",query_string="",x_forwarded_for_header_value="N/A",sig_ids="300000002",sig_names="parimatchru",date_time="2015-04-21 10:51:25",severity="Error",attack_type="Abuse of Functionality",geo_location="UA",ip_address_intelligence="N/A",username="N/A",session_id="8f08ae0f2fbd5d82",src_port="55263",dest_port="80",dest_ip="<>",sub_violations="",virus_name="N/A",uri="/bet/ru",request="GET /bet/ru HTTP/1.1\r\nHost: sports.whgaming.com\r\nConnection: keep-alive\r\nAccept: image/webp,/;q=0.8\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1717.129 Amigo/32.0.1717.129 MRCHROME SOC Safari/537.36\r\nReferer: http://start.parimatchru.com/bonusnew/?btag=a_3615b_234c_231947&id=231947\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\nCookie: banner_click=aleshasavin,NA,NA,NA,admap:159955966FE625989E443CA9CEA4BE36CCBBFCB%3Bsource:[var1]%3Bzone:1487412695%3Bchannel:185050786; clickinfo=pid=185050786&bid=1487412695; vars_info=; source_NR=NR\r\n\r\n"#015

0 Karma

cdstealer
Contributor

Hi Jeffland,
The above is an event that we want to break down. So for each \r\n we require the following line on a new line. The alternative I could try is to setup a SEDCMD in transforms and replace each \r\n with a ,. This I believe would also fix the auto field extraction.

Cheers
Steve

0 Karma

jeffland
SplunkTrust
SplunkTrust

I have the feeling that your event text was somehow corrupted when you posted it. Could you post it as a text file, or as code? There are some "rn" in there, also one with backslashes, but I doubt this is what you wanted to post.
As for your linebreaker, the places you define there will lead to an "event break", i.e. every time the regex fits your data there will be a new event. That's why I doubt you can achieve what you need with the line breaker. But I still haven't fully understood what you need your event to look like. Do you want splunk to display a line break when it shows the events as returned from a search?

0 Karma

cdstealer
Contributor

I'll have to post it as an "answer" as the comment box won't allow the volume of text.

0 Karma

jeffland
SplunkTrust
SplunkTrust

I haven't fully understood what behavior you need. New events are supposed to begin just like they did until now, but inside of them you need linebreaks (i.e. there need to be new lines at the beginning of an event)?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...