I'm forwarding logs via syslog udp to a box and locally ingesting them through splunk. I don't think that contributes to my issue, but wanted to get that out there...
Running any of the saved searches in the Bluecoat app shows the external host IP (that the client is accessing) as the Client IP, which doesn't work well. Same issue with the username field
Here's a sample of one of my bluecoat log entries:
2015-11-05 16:05:14 763 10.80.64.129 cajones - us-ads.openx.net 173.241.244.221 None - - PROXIED "Web Ads/Analytics" http://lagrangenews.com/news/5895/students-strive-to-serve 200 TCP_NC_MISS GET application/json http us-ads.openx.net 80 /w/1.0/acj ?o=5040266117&callback=OX_5040266117&ju=http%3A//lagrangenews.com/news/5895/students-strive-to-serve&jr=http%3A//lagrangenews.com/&auid=538038002&dims=1419x731&adxy=0%2C0&res=1440x900x32&plg=pm&ch=utf-8&tz=300&ws=1419x731&ifr=0&tws=1419x731&vmt=1&bi=66daec33-f482-4821-b52e-7f07e884dfe3&sd=29 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Media Center PC 6.0; BRI/2)" 10.75.95.91 1906 2550 - "none" "none" none
This entry the cs_username field is seen as "unknown", but in this entry, the cs_username is correct:
2015-11-05 16:01:53 54 10.0.19.44 hammlx1 - - OBSERVED "Web Ads/Analytics" http://bcp.crwdcntrl.net/5/c=1226/rand=328960996/pv=y/int=%23OpR%2358075%23DailyMail%20%3A%20Time%20of%20Day%20%3A%2010AM%C2%A0/int=%23OpR%2358689%23Dailymail%20%3A%20Weather-current-description%20%3A%20Cloudy%20with%20outbreaks%20of%20Rain/int=%23OpR%2358690%23Dailymail%20%3A%20Weather-current-temperature%20%3A%2043%C2%B0F/int=%23OpR%2358691%23Dailymail%20%3A%20Weather-upcoming-description%20%3A%20Scattered%20Showers/int=%23OpR%2358692%23Dailymail%20%3A%20Weather-upcoming-temperature%20%3A%2046%C2%B0F/med=%23OpR%2350629%23DailyMail%20%3A%20Home%20Page%20Date%20%3A%20Thursday%2C%20Nov%205th%202015/seg=%23OpR%2350561%23Date%20%3A%20Thursday%2C%20Nov%205th%202015/ug=%23OpR%2350557%23GrapeShot%20%3A%20Channel%20%3A%20gv_weightwatchers/ug=%23OpR%2350558%23GrapeShot%20%3A%20Channel%20%3A%20gv_weightwatchers/ug=%23OpR%2350559%23GrapeShot%20%3A%20US%20Channel%20%3A%20us_negative_crime/ug=%23OpR%2350560%23GrapeShot%20%3A%20US%20Channel%20%3A%20us_negative_crime/genp=%23OpR%2330426%23Site%20Section%20%3A%20index/genp=%23OpR%2330427%23Site%20Section%20%3A%20ushome/rt=ifr 204 TCP_NC_MISS GET image/png;charset=UTF-8 http su.addthis.com 80 /red/usync ?pid=11127&puid=ce5754badf674f9ba73d138adc3e8e1a - "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko" 10.0.2.248 451 2394 - "none" "none"
and the regex in use is:
[bcreportermain_v1]
REGEX = (?P<date>[^\s]+)\s+(?P<time>[^\s]+)\s+(?P<time_taken>[^\s]+)\s+(?P<c_ip>[^\s]+)\s+(?P<cs_username>[^\s]+)\s+(?P<cs_auth_group>[^\s]+)\s+(?P<x_exception_id>[^\s]+)\s+(?P<filter_result>[^\s]+)\s+\"(?P<category>[^\"]+)\"\s+(?P<http_referrer>[^\s]+)\s+(?P<sc_status>[^\s]+)\s+(?P<action>[^\s]+)\s+(?P<cs_method>[^\s]+)\s+(?P<http_content_type>[^\s]+)\s+(?P<cs_uri_scheme>[^\s]+)\s+(?P<cs_host>[^\s]+)\s+(?P<cs_uri_port>[^\s]+)\s+(?P<cs_uri_path>[^\s]+)\s+(?P<cs_uri_query>[^\s]+)\s+(?P<cs_uri_extension>[^\s]+)\s+\"(?P<http_user_agent>[^\"]+)\"\s+(?P<s_ip>[^\s]+)\s+(?P<sc_bytes>[^\s]+)\s+(?P<cs_bytes>[^\s]+)\s+\"?(?P<x_virus_id>[^\"]+)\"?\s+\"(?P<x_bluecoat_application_name>[^\"]+)\"\s+\"(?P<x_bluecoat_application_operation>[^\"]+)\"
Any help you can provide would be appreciated.
Hi banderson7,
The regex is not anchored at the beginning of the line (^) and as it also has a different number of fields (single whitespace separation) it is mismatching where the first field is. Note how the cajones entry has the extra "None - - PROXIED" fields (this causes the initial mismatch).
2015-11-05 16:05:14 763 10.80.64.129 cajones - us-ads.openx.net 173.241.244.221 None - - PROXIED "Web Ads/Analytics"
2015-11-05 16:01:53 54 10.0.19.44 hammlx1 - - OBSERVED "Web Ads/Analytics"
I'm not sure what fields are important (check with bluecoat) but if it's okay to ignore the "None - - PROXIED" fields then the following regex will work better.
^(?P<date>[^\s]+)\s+(?P<time>[^\s]+)\s+(?P<time_taken>[^\s]+)\s+(?P<c_ip>[^\s]+)\s+(?P<cs_username>[^\s]+)\s+(?P<cs_auth_group>[^\s]+)\s+(?P<x_exception_id>[^\s]+)\s+(?P<filter_result>[^\s]+).*?\"(?P<category>[^\"]+)\"\s+(?P<http_referrer>[^\s]+)\s+(?P<sc_status>[^\s]+)\s+(?P<action>[^\s]+)\s+(?P<cs_method>[^\s]+)\s+(?P<http_content_type>[^\s]+)\s+(?P<cs_uri_scheme>[^\s]+)\s+(?P<cs_host>[^\s]+)\s+(?P<cs_uri_port>[^\s]+)\s+(?P<cs_uri_path>[^\s]+)\s+(?P<cs_uri_query>[^\s]+)\s+(?P<cs_uri_extension>[^\s]+)\s+\"(?P<http_user_agent>[^\"]+)\"\s+(?P<s_ip>[^\s]+)\s+(?P<sc_bytes>[^\s]+)\s+(?P<cs_bytes>[^\s]+)\s+\"?(?P<x_virus_id>[^\"]+)\"?\s+\"(?P<x_bluecoat_application_name>[^\"]+)\"\s+\"(?P<x_bluecoat_application_operation>[^\"]+)\"
There's an initial anchor point (^) to match line start and the \s+
between filter_results and category has been changed to .*?
(lazy match anything up to the ") . You can have a play with it here if you want to modify the regex further: https://regex101.com/r/bK5hR5/1
Hope this helps.
The issue here is the fact that the 2 samples do not follow the same format. Things are breaking down at the category field extraction (because category is the first field enclosed in quotes, which requires a different capture than the rest of the fields). The extraction expects it to be the 9th field and it is in the 13th place in the broken message (we are splitting the fields by spaces).
Yeah, so I see. After researching, I have 2 bluecoats sending with this field list:
Fields: date time time-taken c-ip cs-username cs-auth-group s-supplier-name s-supplier-ip s-supplier-country s-supplier-failures x-exception-id sc-filter-result cs-categories cs(Referer) sc-status s-action cs-method rs(Content-Type) cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs-uri-extension cs(User-Agent) s-ip sc-bytes cs-bytes x-virus-id x-bluecoat-application-name x-bluecoat-application-operation cs-threat-risk
and a third sending these fields
date time time-taken c-ip cs-username cs-auth-group x-exception-id sc-filter-result cs-categories cs(Referer) sc-status s-action cs-method rs(Content-Type) cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs-uri-extension cs(User-Agent) s-ip sc-bytes cs-bytes x-virus-id x-bluecoat-application-name x-bluecoat-application-operation
I need to figure how to import both of these correctly. Will be playing w/ that site some more me-thinks.
You need to look in the bluecoat console, reset the third log to bcreportmain_v1.
I guess I could go the easy way 🙂
Hi banderson7,
The regex is not anchored at the beginning of the line (^) and as it also has a different number of fields (single whitespace separation) it is mismatching where the first field is. Note how the cajones entry has the extra "None - - PROXIED" fields (this causes the initial mismatch).
2015-11-05 16:05:14 763 10.80.64.129 cajones - us-ads.openx.net 173.241.244.221 None - - PROXIED "Web Ads/Analytics"
2015-11-05 16:01:53 54 10.0.19.44 hammlx1 - - OBSERVED "Web Ads/Analytics"
I'm not sure what fields are important (check with bluecoat) but if it's okay to ignore the "None - - PROXIED" fields then the following regex will work better.
^(?P<date>[^\s]+)\s+(?P<time>[^\s]+)\s+(?P<time_taken>[^\s]+)\s+(?P<c_ip>[^\s]+)\s+(?P<cs_username>[^\s]+)\s+(?P<cs_auth_group>[^\s]+)\s+(?P<x_exception_id>[^\s]+)\s+(?P<filter_result>[^\s]+).*?\"(?P<category>[^\"]+)\"\s+(?P<http_referrer>[^\s]+)\s+(?P<sc_status>[^\s]+)\s+(?P<action>[^\s]+)\s+(?P<cs_method>[^\s]+)\s+(?P<http_content_type>[^\s]+)\s+(?P<cs_uri_scheme>[^\s]+)\s+(?P<cs_host>[^\s]+)\s+(?P<cs_uri_port>[^\s]+)\s+(?P<cs_uri_path>[^\s]+)\s+(?P<cs_uri_query>[^\s]+)\s+(?P<cs_uri_extension>[^\s]+)\s+\"(?P<http_user_agent>[^\"]+)\"\s+(?P<s_ip>[^\s]+)\s+(?P<sc_bytes>[^\s]+)\s+(?P<cs_bytes>[^\s]+)\s+\"?(?P<x_virus_id>[^\"]+)\"?\s+\"(?P<x_bluecoat_application_name>[^\"]+)\"\s+\"(?P<x_bluecoat_application_operation>[^\"]+)\"
There's an initial anchor point (^) to match line start and the \s+
between filter_results and category has been changed to .*?
(lazy match anything up to the ") . You can have a play with it here if you want to modify the regex further: https://regex101.com/r/bK5hR5/1
Hope this helps.
That's great, and thanks so much for the regex site. I do think that'll do the trick.
And here is another tip, you can test / verify it in Splunk running this command:
/opt/splunk/bin/splunk cmd pcregextest mregex="^(?P<date>[^\s]+)\s+(?P<time>[^\s]+)\s+(?P<time_taken>[^\s]+)\s+(?P<c_ip>[^\s]+)\s+(?P<cs_username>[^\s]+)\s+(?P<cs_auth_group>[^\s]+)\s+(?P<x_exception_id>[^\s]+)\s+(?P<filter_result>[^\s]+).*?\"(?P<category>[^\"]+)\"\s+(?P<http_referrer>[^\s]+)\s+(?P<sc_status>[^\s]+)\s+(?P<action>[^\s]+)\s+(?P<cs_method>[^\s]+)\s+(?P<http_content_type>[^\s]+)\s+(?P<cs_uri_scheme>[^\s]+)\s+(?P<cs_host>[^\s]+)\s+(?P<cs_uri_port>[^\s]+)\s+(?P<cs_uri_path>[^\s]+)\s+(?P<cs_uri_query>[^\s]+)\s+(?P<cs_uri_extension>[^\s]+)\s+\"(?P<http_user_agent>[^\"]+)\"\s+(?P<s_ip>[^\s]+)\s+(?P<sc_bytes>[^\s]+)\s+(?P<cs_bytes>[^\s]+)\s+\"?(?P<x_virus_id>[^\"]+)\"?\s+\"(?P<x_bluecoat_application_name>[^\"]+)\"\s+\"(?P<x_bluecoat_application_operation>[^\"]+)\"" test_str="2015-11-05 16:01:53 54 10.0.19.44 hammlx1 - - OBSERVED \"Web Ads/Analytics\" http://bcp.crwdcntrl.net/5/c=1226/rand=328960996/pv=y/int=%23OpR%2358075%23DailyMail%20%3A%20Time%20... 204 TCP_NC_MISS GET image/png;charset=UTF-8 http su.addthis.com 80 /red/usync ?pid=11127&puid=ce5754badf674f9ba73d138adc3e8e1a - \"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko\" 10.0.2.248 451 2394 - \"none\" \"none\""
and this will be the result:
#### Capturing group data #####
Group | Name | Value
--------------------------------------
1 | date | 2015-11-05
2 | time | 16:01:53
3 | time_taken | 54
4 | c_ip | 10.0.19.44
5 | cs_username | hammlx1
6 | cs_auth_group | -
7 | x_exception_id | -
8 | filter_result | OBSERVED
9 | category | Web Ads/Analytics
10 | http_referrer | http://bcp.crwdcntrl.net/5/c=1226/rand=328960996/pv=y/int=%23OpR%2358075%23DailyMail%20%3A%20Time%20...
11 | sc_status | 204
12 | action | TCP_NC_MISS
13 | cs_method | GET
14 | http_content_type | image/png;charset=UTF-8
15 | cs_uri_scheme | http
16 | cs_host | su.addthis.com
17 | cs_uri_port | 80
18 | cs_uri_path | /red/usync
19 | cs_uri_query | ?pid=11127&puid=ce5754badf674f9ba73d138adc3e8e1a
20 | cs_uri_extension | -
21 | http_user_agent | Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
22 | s_ip | 10.0.2.248
23 | sc_bytes | 451
24 | cs_bytes | 2394
25 | x_virus_id | -
26 | x_bluecoat_application_name | none
27 | x_bluecoat_application_operation | none
cheers, MuS
Yes, it is a very useful site. Please accept the answer if it has helped.