All Apps and Add-ons

Splunk for Blue Coat ProxySG 3.0.7: Regex for User agent fails when it is set to dash "-". Regex needs to be updated

Explorer

Hello,

The regex in 3.0.7 fails when User agent is set to just dash - such as in the example here:

2015-11-10 02:00:00 100 xxx.xx.xxx.xxx XYZ abc\ryolo - OBSERVED "Search Engines/Portals" -  200 TCP_NC_MISS GET text/html;%20charset=ISO-8859-1 http www.google.co.uk 80 / ?.... - - 1.2.3.4 57738 1012 - "none" "none"

A better regex is:

^(?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}\s+(?<s_ip>[^\s]+)\s+(?<sc_bytes>[^\s]+)\s+(?<cs_bytes>[^\s]+)\s+\"?(?<x_virus_id>[^\"]+)\"?\s+\"(?<x_bluecoat_application_name>[^\"]+)\"\s+\"(?<x_bluecoat_application_operation>[^\"]+)\"
0 Karma
1 Solution

Explorer

Posting an improved regex:

^(?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}\s+(?<s_ip>[^\s]+)\s+(?<sc_bytes>[^\s]+)\s+(?<cs_bytes>[^\s]+)\s+\"?(?<x_virus_id>[^\"]+)\"?\s+\"{0,1}(?<x_bluecoat_application_name>[^\"]+)\"{0,1}\s+\"{0,1}(?<x_bluecoat_application_operation>[^\"]+)\"{0,1}

View solution in original post

Explorer

Posting an improved regex:

^(?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}\s+(?<s_ip>[^\s]+)\s+(?<sc_bytes>[^\s]+)\s+(?<cs_bytes>[^\s]+)\s+\"?(?<x_virus_id>[^\"]+)\"?\s+\"{0,1}(?<x_bluecoat_application_name>[^\"]+)\"{0,1}\s+\"{0,1}(?<x_bluecoat_application_operation>[^\"]+)\"{0,1}

View solution in original post

Contributor

None of these regex work to extract httpuseragent. Did anyone get it right?

0 Karma

Explorer

Slightly tweaked to match more weirdness:

^(?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}\s+(?<s_ip>[^\s]+)\s+(?<sc_bytes>[^\s]+)\s+(?<cs_bytes>[^\s]+)\s+\"?(?<x_virus_id>[^\"]+)\"?\s+\"{0,1}(?<x_bluecoat_application_name>[^\"]+)\"{0,1}\s+\"{0,1}(?<x_bluecoat_application_operation>[^\"]+)\"{0,1}
0 Karma

Community Manager
Community Manager

Hi @konrads

Thanks for sharing this tip with the community. Can you actually post the regex as a formal answer in the "Enter your answer here..." box below and accept it to resolve this post? It'll make it easier for other users to find.

Also, you might want to consider formally submitting this issue as a bug here:
http://www.splunk.com/r/bugs

0 Karma