I can't find how to extract the User Agent field from the Blue Coat proxy logs. I couldn't find the correct answer yet on the forum. All of the answers I went through had regex that didn't work correctly.
REGEX = (?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}
[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}
Extracts 'dashes like this, together with a dvc_ip.
Does anyone have this issues sorted out already?
This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA
(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$
I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.
Link for the new TA
TA for Bluecoat
This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA
(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$
I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.
Link for the new TA
TA for Bluecoat
This totally fixed the issue. I downloaded the new add-on for new logs, and play with regex and it worked. Thanks a lot for your help on that.
The posted example does not contain a user agent string would look like this
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0
It does contain "-" and this is why this log is messing the field extraction!
Any idea how to include a "-" in the regex?
Can you paste one of the events that are being mis... mis... what's the word? Misinterpreted? Misregexed? Misparsed? Well, no matter on the term. 🙂
Where in the event is "10.106.4.11"? You could just be missing one extraction or something.
FYI, optional characters or groups can be done like ab?c, which would match abc or ac, because the b would be optional since it's followed by a ?.
2016-01-18 04:09:52 226 10.115.2.45 - - - OBSERVED "Technology/Internet" - 302 TCP_NC_MISS GET text/html http portal.domain.net 80 / - - - 10.115.6.11 177 80 - "none" "none"