All Apps and Add-ons

Splunk for Blue Coat ProxySG: Using " " as a delimiter, how to redefine a field with the " character to prevent field extraction issues?

DavidHourani
Super Champion

Hello,

I have already put in place the Splunk for Bluecoat app and been using it for a while. I recently noticed some errors appearing on my reports and I think there is a problem with the extraction mechanism used.

In this app, we use " " as DELIM and it doesn't take into consideration that in some cases the URL and URI's contain the character " which breaks the entire log line and makes most fields overlap.

Any idea how I can redefine the URL field without having to redefine all the other fields ?

Best regards,
Hourani

0 Karma
1 Solution

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

View solution in original post

DavidHourani
Super Champion

I ended up creating a couple of regex to extract the fields properly. The default props.conf provided in the app will not work properly and there will be a lot of overlapping fields.

MOberschelp
Explorer

Can you post your regex here?

0 Karma

DavidHourani
Super Champion

Sure, I don't have all the fields extracted but here's a list of what i got out.

EXTRACT-time_taken,src_ip,code_retour,action,bytes_in,bytes_out,cs_method,dest_host = ^(?:[^:\n]*:){2}\d+\s+(?P<time_taken>\d+)\s+(?P<src_ip>[^ ]+)\s+(?P<code_retour>\d+)[^ \n]* (?P<action>[^ ]+)\s+(?P<bytes_in>[^ ]+)\s+(?P<bytes_out>\d+)\s+(?P<cs_method>[^ ]+)\s+\-\s+(?P<dest_host>[^ ]+)
EXTRACT-http_url = ^(?:[^\s]*\s){12}(?P<http_url>.+?(?=\s\-\s(\w|\-)+\s\-\s(\w|\-)+))
EXTRACT-src_user = \s\-\s(?P<src_user>(\w{6}|\-)?(?=\s\-\s))(?:[^\s]*\s){5}\-\s\-\s(?:[^\s]*\s\".+\"\s[^\s]*\s\".+\")\Z
EXTRACT-filter_result = \s\-\s\-\s(?P<filter_result>[^\s]+?)(?:\s\".+\"\s[^\s]+\s\".+\")\Z
EXTRACT-category = \s\-\s\-\s[^\s]+\s\"(?P<category>[^\"]+?)(?:\"\s[^\s]+\s\".+\")\Z
EXTRACT-proxy_name = \s\"(?P<proxy_name>px[^\"]+?)(?:\")\Z

This is a log example :

2015-04-15 01:02:03 1032 1.2.3.4 200 TCP_MISS userID 154 GET - test.com - http://test.com/thisisatest - - - - test.com HTTP/1.1 - - PROXIED "none" 1031 "proxyname"

0 Karma

MOberschelp
Explorer

Great. This looks similar to my logs. Do you had the same problem with the log header?
I tried to ignore them with a entry in props.conf but this won't work anyhow.

I have a sourcetype called "bcoat:proxysg" (I tried also with the bcoat_proxysg) but the stanza is not working.

[bcoat:proxysg]
TZ = Europe/Berlin
HEADER_FIELD_LINE_NUMBER = 2
CHECK_FOR_HEADER = true

0 Karma

DavidHourani
Super Champion

Yeap same problem with the header ^^ I simply wrote a SEDCMD to remove anything that doesn't start with the time field hence deleting all junk lines and headers:

SEDCMD-<class> = s/^(?!.*\d{4}-\d{2}-\d{2}.*\s).*//g

Let me know if it works for u.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...