Splunk Search

What is regex to extract the "http_user_agent" field from proxy logs when the content value is AND is not available?

hartfoml
Motivator

There is a field in my Bluecoat Proxy logs that is not extracting correctly.

Here are portions of the two losable logs;

2015-02-02 14:59:08 1170 x.x.x.x - - - OBSERVED "Technology/Internet" - 200 TCP_NC_MISS POST application/json;charset=utf-8 http www.umeng.com 80 /check_config_update - - - y.y.y.y 185 659 - "none" "none" x.x.x.x "Tengine" www.umeng.com

2015-02-02 14:54:09 939 x.x.x.x - - - OBSERVED "Business/Economy" http://cloudcroftwebcam.com/camera-1/ 200 TCP_MISS GET image/jpeg http cloudcroftwebcam.com 80 /camera1.jpg ?Mon%20Feb%202%2007:54:08%20MST%202015 jpg "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)" y.y.y.y 71592 367 - "none" "none" x.x.x.x "Apache" cloudcroftwebcam.com

The field that is not extracting correctly is the http_user_agent field. This field in the top record is the third "-" just before the "y.y.y.y" IP. In the lower record, the field is the content between the quote marks "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"

This is my regex:

\s+\"(?<http_user_agent_new>[^\"]+)\"\s+

This works well when there is content between the quotes, but not when there is no content and just a "-" with no quotes.
I have tried this regex:

\s+(?<http_user_agent_new>[^\s]+)\s+

and it works until there is a space inside the quotes then the regex stops.
I tried this regex:

[\s\"|\s+](?<http_user_agent_new>[^[\"|\s]]+)[\"\s+|\s+]

But this is too many OR's for regex to understand what I want.

How can I search for the dash with no quotes when there is no "http_user_agent" content and search for the content between the quotes when there is?

0 Karma

dvwijk
Explorer

Hi,

this one is working for both examples... But not the cleanest I think:

^.*?(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]).*?".*?".*?(?<http_user_agent>-|".*?")\s(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])

Danny

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...