Splunk Search

Multiple LINE_BREAKER regex

cdstealer
Contributor

Hi,
I'll cut straight to the chase. I have a sourcetype that contains 2 log sources. Both are broken correctly using the props entry

TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE_DATE = true
LINE_BREAKER = ([\r\n]+)
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype

However, one of the sources contains a lot of visible EOL terminators source_NR=NR\r\n\r\n. It is now required for the visible EOL terminators to be parsed as actual EOLs.

I've tried to apply various types of multi regex on the LINE_BREAKER to no avail. From what I've read, it is possible, but anything I try fails and breaks any line breaking.

A few things I've tried:

([\r\n]+)|([\\r\\n]+)
([\r\n]+)|\\r\\n
([\r\n]+)(\\r)(\\n)

The list goes on.

Any advice would be greatly appreciated.

Cheers
Steve

0 Karma
1 Solution

cdstealer
Contributor

This is the raw event:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015

This is how I thought the LINE_BREAKER would have changed it:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015

But I'm starting to lean towards using transforms to replace the \r\n so that the whole event is standardised?

View solution in original post

0 Karma

cdstealer
Contributor

This is the raw event:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015

This is how I thought the LINE_BREAKER would have changed it:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015

But I'm starting to lean towards using transforms to replace the \r\n so that the whole event is standardised?

0 Karma

jeffland
SplunkTrust
SplunkTrust

Hm. I'd also suggest replacing those \r\n with an actual linebreak. Have a look here and see if it works for you.

cdstealer
Contributor

nice 🙂 Thanks jeffland. very much appreciated.

0 Karma

cdstealer
Contributor

Just for completeness 🙂

My props stanza is:

[f5]
TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
BREAK_ONLY_BEFORE_DATE = True
LINE_BREAKER = ([\r\n\$])
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype
SEDCMD-newline = s/\\r\\n/,/g
SEDCMD-eventend = s/#015//g

So now all the fields are correctly extracted and the annoying #015 is removed. Plus the other source is untouched.

cdstealer
Contributor

2015-04-21T10:51:26+01:00 <> ASM: unit_hostname="<>",management_ip_address="<>",http_class_name="/Common/pl_restricted_L0_prod",web_application_name="/Common/pl_restricted_L0_prod",policy_name="/Common/pl_restricted_L0_prod",policy_apply_date="2015-04-20 21:44:42",violations="Attack signature detected",support_id="16995741371937106148",request_status="blocked",response_code="0",ip_client="46.201.133.82",route_domain="0",method="GET",protocol="HTTP",query_string="",x_forwarded_for_header_value="N/A",sig_ids="300000002",sig_names="parimatchru",date_time="2015-04-21 10:51:25",severity="Error",attack_type="Abuse of Functionality",geo_location="UA",ip_address_intelligence="N/A",username="N/A",session_id="8f08ae0f2fbd5d82",src_port="55263",dest_port="80",dest_ip="<>",sub_violations="",virus_name="N/A",uri="/bet/ru",request="GET /bet/ru HTTP/1.1\r\nHost: sports.whgaming.com\r\nConnection: keep-alive\r\nAccept: image/webp,/;q=0.8\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1717.129 Amigo/32.0.1717.129 MRCHROME SOC Safari/537.36\r\nReferer: http://start.parimatchru.com/bonusnew/?btag=a_3615b_234c_231947&id=231947\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\nCookie: banner_click=aleshasavin,NA,NA,NA,admap:159955966FE625989E443CA9CEA4BE36CCBBFCB%3Bsource:[var1]%3Bzone:1487412695%3Bchannel:185050786; clickinfo=pid=185050786&bid=1487412695; vars_info=; source_NR=NR\r\n\r\n"#015

0 Karma

cdstealer
Contributor

Hi Jeffland,
The above is an event that we want to break down. So for each \r\n we require the following line on a new line. The alternative I could try is to setup a SEDCMD in transforms and replace each \r\n with a ,. This I believe would also fix the auto field extraction.

Cheers
Steve

0 Karma

jeffland
SplunkTrust
SplunkTrust

I have the feeling that your event text was somehow corrupted when you posted it. Could you post it as a text file, or as code? There are some "rn" in there, also one with backslashes, but I doubt this is what you wanted to post.
As for your linebreaker, the places you define there will lead to an "event break", i.e. every time the regex fits your data there will be a new event. That's why I doubt you can achieve what you need with the line breaker. But I still haven't fully understood what you need your event to look like. Do you want splunk to display a line break when it shows the events as returned from a search?

0 Karma

cdstealer
Contributor

I'll have to post it as an "answer" as the comment box won't allow the volume of text.

0 Karma

jeffland
SplunkTrust
SplunkTrust

I haven't fully understood what behavior you need. New events are supposed to begin just like they did until now, but inside of them you need linebreaks (i.e. there need to be new lines at the beginning of an event)?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...