All Apps and Add-ons

Splunk for Blue Coat ProxySG: Using " " as a delimiter, how to redefine a field with the " character to prevent field extraction issues?

DavidHourani
Super Champion

Hello,

I have already put in place the Splunk for Bluecoat app and been using it for a while. I recently noticed some errors appearing on my reports and I think there is a problem with the extraction mechanism used.

In this app, we use " " as DELIM and it doesn't take into consideration that in some cases the URL and URI's contain the character " which breaks the entire log line and makes most fields overlap.

Any idea how I can redefine the URL field without having to redefine all the other fields ?

Best regards,
Hourani

0 Karma
1 Solution

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

View solution in original post

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

MOberschelp
Explorer

Can you post your regex here?

0 Karma

DavidHourani
Super Champion

Sure, I don't have all the fields extracted but here's a list of what i got out.

EXTRACT-time_taken,src_ip,code_retour,action,bytes_in,bytes_out,cs_method,dest_host = ^(?:[^:\n]*:){2}\d+\s+(?P<time_taken>\d+)\s+(?P<src_ip>[^ ]+)\s+(?P<code_retour>\d+)[^ \n]* (?P<action>[^ ]+)\s+(?P<bytes_in>[^ ]+)\s+(?P<bytes_out>\d+)\s+(?P<cs_method>[^ ]+)\s+\-\s+(?P<dest_host>[^ ]+)
EXTRACT-http_url = ^(?:[^\s]*\s){12}(?P<http_url>.+?(?=\s\-\s(\w|\-)+\s\-\s(\w|\-)+))
EXTRACT-src_user = \s\-\s(?P<src_user>(\w{6}|\-)?(?=\s\-\s))(?:[^\s]*\s){5}\-\s\-\s(?:[^\s]*\s\".+\"\s[^\s]*\s\".+\")\Z
EXTRACT-filter_result = \s\-\s\-\s(?P<filter_result>[^\s]+?)(?:\s\".+\"\s[^\s]+\s\".+\")\Z
EXTRACT-category = \s\-\s\-\s[^\s]+\s\"(?P<category>[^\"]+?)(?:\"\s[^\s]+\s\".+\")\Z
EXTRACT-proxy_name = \s\"(?P<proxy_name>px[^\"]+?)(?:\")\Z

This is a log example :

2015-04-15 01:02:03 1032 1.2.3.4 200 TCP_MISS userID 154 GET - test.com - http://test.com/thisisatest - - - - test.com HTTP/1.1 - - PROXIED "none" 1031 "proxyname"

0 Karma

MOberschelp
Explorer

Great. This looks similar to my logs. Do you had the same problem with the log header?
I tried to ignore them with a entry in props.conf but this won't work anyhow.

I have a sourcetype called "bcoat:proxysg" (I tried also with the bcoat_proxysg) but the stanza is not working.

[bcoat:proxysg]
TZ = Europe/Berlin
HEADER_FIELD_LINE_NUMBER = 2
CHECK_FOR_HEADER = true

0 Karma

DavidHourani
Super Champion

Yeap same problem with the header ^^ I simply wrote a SEDCMD to remove anything that doesn't start with the time field hence deleting all junk lines and headers:

SEDCMD-<class> = s/^(?!.*\d{4}-\d{2}-\d{2}.*\s).*//g

Let me know if it works for u.

0 Karma
Get Updates on the Splunk Community!

Using Machine Learning for Hunting Security Threats

WATCH NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more for ...

Observability Newsletter Highlights | March 2023

 March 2023 | Check out the latest and greatestSplunk APM's New Tag Filter ExperienceSplunk APM has updated ...

Security Newsletter Updates | March 2023

 March 2023 | Check out the latest and greatestUnify Your Security Operations with Splunk Mission Control The ...