Splunk Search

Blue Coat Proxy Logs - User Agent Field Extraction

daniel_augustyn
Contributor

I can't find how to extract the User Agent field from the Blue Coat proxy logs. I couldn't find the correct answer yet on the forum. All of the answers I went through had regex that didn't work correctly.

REGEX = (?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}

[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}

Extracts 'dashes like this, together with a dvc_ip.

  • 10.106.4.11
  • 10.106.4.11

Does anyone have this issues sorted out already?

Tags (1)
0 Karma
1 Solution

rfaircloth_splu
Splunk Employee
Splunk Employee

This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA

(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$

I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.

Link for the new TA
TA for Bluecoat

View solution in original post

rfaircloth_splu
Splunk Employee
Splunk Employee

This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA

(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$

I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.

Link for the new TA
TA for Bluecoat

daniel_augustyn
Contributor

This totally fixed the issue. I downloaded the new add-on for new logs, and play with regex and it worked. Thanks a lot for your help on that.

0 Karma

rfaircloth_splu
Splunk Employee
Splunk Employee

The posted example does not contain a user agent string would look like this
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0

daniel_augustyn
Contributor

It does contain "-" and this is why this log is messing the field extraction!

0 Karma

daniel_augustyn
Contributor

Any idea how to include a "-" in the regex?

0 Karma

Richfez
SplunkTrust
SplunkTrust

Can you paste one of the events that are being mis... mis... what's the word? Misinterpreted? Misregexed? Misparsed? Well, no matter on the term. 🙂

Where in the event is "10.106.4.11"? You could just be missing one extraction or something.

FYI, optional characters or groups can be done like ab?c, which would match abc or ac, because the b would be optional since it's followed by a ?.

0 Karma

daniel_augustyn
Contributor

2016-01-18 04:09:52 226 10.115.2.45 - - - OBSERVED "Technology/Internet" - 302 TCP_NC_MISS GET text/html http portal.domain.net 80 / - - - 10.115.6.11 177 80 - "none" "none"

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...