Splunk Search

Blue Coat Proxy Logs - User Agent Field Extraction

daniel_augustyn
Contributor

I can't find how to extract the User Agent field from the Blue Coat proxy logs. I couldn't find the correct answer yet on the forum. All of the answers I went through had regex that didn't work correctly.

REGEX = (?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<time_taken>[^\s]+)\s+(?<c_ip>[^\s]+)\s+(?<cs_username>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<sc_status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<cs_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<cs_host>[^\s]+)\s+(?<cs_uri_port>[^\s]+)\s+(?<cs_uri_path>[^\s]+)\s+(?<cs_uri_query>[^\s]+)\s+(?<cs_uri_extension>[^\s]+)\s+[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}

[\"]{0,1}(?<http_user_agent>[^\"]+)[\"]{0,1}

Extracts 'dashes like this, together with a dvc_ip.

  • 10.106.4.11
  • 10.106.4.11

Does anyone have this issues sorted out already?

Tags (1)
0 Karma
1 Solution

rfaircloth_splu
Splunk Employee
Splunk Employee

This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA

(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$

I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.

Link for the new TA
TA for Bluecoat

View solution in original post

rfaircloth_splu
Splunk Employee
Splunk Employee

This looks like bluecoat 6.5+ logging which is now covered in the Splunk supported TA

(?:"([^"]+)"|(\S+))\s+(?:"(\d{1,2}:\d{1,2}:\d{1,2})"|(\d{1,2}:\d{1,2}:\d{1,2}))\s+(?:"(\d+)"|(\d+))\s+(?:"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s+(?:"([^"]+)"|(\S+))\s*$

I recommend pulling the TA from splunk base and starting with the existing solution. What I don't like is the current code will permit - to enter into the field values. So I would add EVAL-field= nullif() to address that issue.

Link for the new TA
TA for Bluecoat

daniel_augustyn
Contributor

This totally fixed the issue. I downloaded the new add-on for new logs, and play with regex and it worked. Thanks a lot for your help on that.

0 Karma

rfaircloth_splu
Splunk Employee
Splunk Employee

The posted example does not contain a user agent string would look like this
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0

daniel_augustyn
Contributor

It does contain "-" and this is why this log is messing the field extraction!

0 Karma

daniel_augustyn
Contributor

Any idea how to include a "-" in the regex?

0 Karma

Richfez
SplunkTrust
SplunkTrust

Can you paste one of the events that are being mis... mis... what's the word? Misinterpreted? Misregexed? Misparsed? Well, no matter on the term. 🙂

Where in the event is "10.106.4.11"? You could just be missing one extraction or something.

FYI, optional characters or groups can be done like ab?c, which would match abc or ac, because the b would be optional since it's followed by a ?.

0 Karma

daniel_augustyn
Contributor

2016-01-18 04:09:52 226 10.115.2.45 - - - OBSERVED "Technology/Internet" - 302 TCP_NC_MISS GET text/html http portal.domain.net 80 / - - - 10.115.6.11 177 80 - "none" "none"

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Event Series: Splunk Observability Metrics Cost Optimization

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...