All Apps and Add-ons
Highlighted

Splunk for Blue Coat ProxySG: About 5% of our logs did not get any field extraction. Has anyone noticed bad transforms.conf regex?

Explorer

With the ProxySG using the default "bcreportermainv1" output, we found that in about 5% of our logs did not get any field extraction. We noted that when the "httpuser_agent" was blank (represented by a hyphen), it was not quoted. This is normally a quoted field. So, we surmised that it might be a problem with the regex. Turns out we were correct.

In the line below, the hyphen just before "2.2.2.2" is supposed to be the httpuseragent... as you can see it's unquoted.

2015-12-02 14:38:17 84 1.1.1.1 - - - OBSERVED "Business/Economy" -  200 TCP_NC_MISS GET text/html;charset=UTF-8 http prod-app.enmetric.com 80 /Command-war/retrieve ?limit=5 - - 2.2.2.2 198 129 - "none" "none"

In the line below, you can clearly see the quoted User-Agent field preceding 4.4.4.4 ...

2015-12-02 14:38:17 1662 1.1.1.2 - - - OBSERVED "Web Ads/Analytics" -  200 TCP_NC_MISS GET image/gif http p.liadm.com 80 /imp ?s=5 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)" 4.4.4.4 478 982 - "none" "none"

Original transform for bcreporter_v1

(?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+\"(?<http_user_agent>[^\"]+)\"\s+(?<s_ip>[^\s]+)\s+(?<sc_bytes>[^\s]+)\s+(?<cs_bytes>[^\s]+)\s+\"?(?<x_virus_id>[^\"]+)\"?\s+\"(?<x_bluecoat_application_name>[^\"]+)\"\s+\"(?<x_bluecoat_application_operation>[^\"]+)\"

Here it is all by itself

\"(?<http_user_agent>[^\"]+)\"

Config for "bcreportermain_v1"

date time time-taken c-ip cs-username cs-auth-group x-exception-id sc-filter-result cs-categories cs(Referer)  sc-status s-action cs-method rs(Content-Type) cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs-uri-extension cs(User-Agent) s-ip sc-bytes cs-bytes x-virus-id x-bluecoat-application-name x-bluecoat-application-operation

Not sure whether the field should be fixed so that it is always quoted or if the regex is bad... curious if anyone else has noticed this.

0 Karma
Highlighted

Re: Splunk for Blue Coat ProxySG: About 5% of our logs did not get any field extraction. Has anyone noticed bad transforms.conf regex?

Community Manager
Community Manager

Hi @brigancc

Would you actually be able to post your fixed transforms.conf regular expressions that solved your issue as an actual answer in the "Enter your answer here..." box below? That way, this post will actually show as having an answer that may help other users out instead of showing as unresolved.

0 Karma
Highlighted

Re: Splunk for Blue Coat ProxySG: About 5% of our logs did not get any field extraction. Has anyone noticed bad transforms.conf regex?

Explorer

Very good point. Thank you for the suggestion. I'll update the post and put the solution as an answer. Thanks!

0 Karma
Highlighted

Re: Splunk for Blue Coat ProxySG: About 5% of our logs did not get any field extraction. Has anyone noticed bad transforms.conf regex?

Explorer

Used the awesome regex tool at http://regex101.com/#PCRE to visualize the matching and found that the httpuseragent named capture group was surrounded by literal quotes. That caused the whole regex to not match when the event didn't have a user agent.

The fix was to make the quotes optional by adding the "?" quantifier to make it match 0 or 1 time.

After applying the change we went from 95% overall field extraction to 100%
Fixed transform for bcreporter_v1

(?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+\"?(?<http_user_agent>[^\"]+)\"?\s+(?<s_ip>[^\s]+)\s+(?<sc_bytes>[^\s]+)\s+(?<cs_bytes>[^\s]+)\s+\"?(?<x_virus_id>[^\"]+)\"?\s+\"(?<x_bluecoat_application_name>[^\"]+)\"\s+\"(?<x_bluecoat_application_operation>[^\"]+)\"

Here it is all by itself

\"?(?<http_user_agent>[^\"]+)\"?

View solution in original post

Highlighted

Re: Splunk for Blue Coat ProxySG: About 5% of our logs did not get any field extraction. Has anyone noticed bad transforms.conf regex?

Community Manager
Community Manager

Thanks @brigancc 🙂

Cheers!

0 Karma