I can't find how to extract the User Agent field from the Blue Coat proxy logs. I couldn't find the correct answer yet on the forum. All of the answers I went through had regex that didn't work correctly.
REGEX = (?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}
[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}
Extracts 'dashes like this, together with a dvc_ip.
Does anyone have this issues sorted out already?
This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA
(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$
I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.
Link for the new TA
TA for Bluecoat
This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA
(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$
I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.
Link for the new TA
TA for Bluecoat
This totally fixed the issue. I downloaded the new add-on for new logs, and play with regex and it worked. Thanks a lot for your help on that.
The posted example does not contain a user agent string would look like this
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0
It does contain "-" and this is why this log is messing the field extraction!
Any idea how to include a "-" in the regex?
Can you paste one of the events that are being mis... mis... what's the word? Misinterpreted? Misregexed? Misparsed? Well, no matter on the term. 🙂
Where in the event is "10.106.4.11"? You could just be missing one extraction or something.
FYI, optional characters or groups can be done like ab?c
, which would match abc
or ac
, because the b
would be optional since it's followed by a ?
.
2016-01-18 04:09:52 226 10.115.2.45 - - - OBSERVED "Technology/Internet" - 302 TCP_NC_MISS GET text/html http portal.domain.net 80 / - - - 10.115.6.11 177 80 - "none" "none"