Getting Data In

BlueCoatProxySG to Syslog to Splunk. How do I get this to work?

fairje
Communicator

First off I have looked over the instructions contained here:
http://docs.splunk.com/Documentation/AddOns/latest/BlueCoatProxySG/About

That being said, the biggest issue is that I have been working with the understanding that feeding syslog directly into Splunk (e.g. monitor entry on port 514) is not the way to go since every time you do a restart of the Splunk service you are going to lose logs, and the lost is generally however long of a time period that the service is restarting (anywhere from 1 minute to 10 minutes... yeah... it happens).

So I have rsyslog (version 5, thanks Red Hat for lacking in the latest versions) and have set up the instructions on that to handle the logs. Because of the way that BlueCoat dumps the logs as one giant flat file (essentially) and doesn't really export their access logs in a traditional syslog fashion, this required the following template on my rsyslog:

$template bluecoat, "/opt/log/%fromhost%/syslog.log"

Just a side bar here, since rsyslog documentation isn't the greatest, and it took a very long time to figure some things out, here is all the relevant pieces from my configuration to get rsyslog to work:

#### MODULES ####

$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
$ModLoad imklog   # provides kernel logging support (previously done by rklogd)
#$ModLoad immark  # provides --MARK-- message capability
$ModLoad imtcp

#### GLOBAL DIRECTIVES ####
# Needed to allow proper permissions for creating files
$umask 0000

# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat


#### Templates ####
# log every host in its own directory
$template splunklog, "/opt/log/%HOSTNAME%/%PROGRAMNAME%.log"
$template splunkmsg, "%rawmsg%\n"

#bluecoats aren't cooperating at all using this in the meantime
$template bluecoatlog, "/opt/log/%fromhost%/syslog.log"

# ## BlueCoat Ruleset ##
$RuleSet bluecoat

# Force permissions to work with Splunk as files are created
$DirOwner root
$DirGroup splunk
$FileOwner root
$FileGroup splunk
$DirCreateMode 0750
$FileCreateMode 0640

# Log everything coming in through the network (see above template)
*.* ?bluecoatlog;splunkmsg
# ## End BlueCoat Ruleset ##

# BlueCoat
$InputTCPServerBindRuleset bluecoat
$InputTCPServerRun 1514

I then set a monitor path

[monitor:///opt/log/bluecoat.mydomain.com/]
disabled = 0
host_segment = 3
index = bluecoat
sourcetype = bluecoat:proxysg:access:syslog
whitelist = syslog\.log$

Now, you would think this minor change, writing to a file before reading it into Splunk would cause a problem, but lo and behold I'm having to major issues:

1: My log rotation script breaks the log writing. No it isn't a permission issue (or shouldn't be) because it works on my other syslog data types. I am currently running the following logrotate every hour:

/opt/log/*/*.log {
    missingok
    create 0640 root splunk
    rotate 3
    sharedscripts
    compress
    postrotate
        reload rsyslog >/dev/null 2>&1 || true
    endscript
}

This is controlled by an hourly cron job such that I can keep around 3 hours of logs in case something happens. The end state would be to eventually up this to around 2 days. Anyway, the issue is that it compresses the file, and then successfully makes a new file "syslog.log" like it is supposed to, but it seems like the reload isn't quite working since at that point I just stop getting logs unless I manually do a "service rsyslog restart". Note that a service restart is much more harsh than a reload and you run the risk of losing a little bit of logs during the period that it blows away your connections (if TCP) or that logs just went to the abyss (UDP).

2: Splunk is not decoding the file correctly at all. It is just breaking each event on the line break instead of breaking them where they should be. Note that each line break is basically coming in as each packet, so the way this dumps you get multiple events per packet transmitted and it doesn't break each packet into individual events.

Anyone else have any luck setting BlueCoat up this way?

millerhack
Engager

We just recently ran into this same problem with rsyslog and our Blue Coat proxies. Here is some info that we found in case it proves helpful for anyone viewing this thread down the road.

EDIT:
We ended up getting to the bottom of this issue. The problem is rsyslog's default behavior when attempting to parse TCP syslogs. Specifically, "Octet Counted Framing" support is enabled by default; which causes typical BlueCoat access logs to be parsed incorrectly. For more info on the octet-counted framing syntax, see RFC6587 https://tools.ietf.org/html/rfc6587#section-3.4.1

TL;DR
In order to get rsyslog to even begin to properly handle BlueCoat access logs, you must disable the "octet counted framing" support on your TCP listener(s):

## rsyslog.conf
# if using the "imtcp" module
$InputTCPSupportOctetCountedFraming off
# if using the "imptcp" module (Plain TCP)
$InputPTCPSupportOctetCountedFraming off

# -- OR -- (new-style RainerScript syntax)

# if using the "imtcp" module
input(type="imtcp" port="<TCP_PORT>" ruleset="<RuleSet_Name>" supportOctetCountedFraming="off")
# if using the "imptcp" module (Plain TCP)
input(type="imptcp" port="<TCP_PORT>" ruleset="<RuleSet_Name>" supportOctetCountedFraming="off")

NOTE: The following answer has been edited to include "octet counted framing" information.

Here are some facts:

  • Blue Coat simply writes the access log file straight to the TCP socket
  • Blue Coat config allows "periodic" and "continuous" sending modes (for a syslog destination, be sure to use continuous. (See Symantec support article: https://support.symantec.com/en_US/article.TECH242216.html)
  • Blue Coat log records are delimited by CARRIAGE RETURN LINE FEED ( \r\n)
    • SIDENOTE: the carriage return combined with the rsyslog $EscapeControlCharactersOnReceive on option causes every Blue Coat message to end with #015 which is 13 in octal -- ASCII 13 = carriage return.
  • rsyslog (default config) uses LINE FEED ( \n) character as message delimiter
  • rsyslog (default config) tries to parse every line (important to understand later)

When the Blue Coat device sends access logs, the messages look something like:

2018-12-04 12:38:37 96 192.168.1.122 200 TCP_NC_MISS 999 572 GET https www.example.com 443 <... truncated ...>
2018-12-04 12:38:37 19 192.168.1.153 200 TCP_ACCELERATED 39 213 CONNECT tcp www.example.co <... truncated ...>

When you send messages like this to rsyslog, rsyslogd will emit error messages like:

rsyslogd: Framing Error in received TCP message: delimiter is not SP but has ASCII value 45. [v8.24.0-34.el7]

This is because rsyslog is assuming the client is employing "octet-counted framing" because it begins with a numeric value. The warning message is because octet-counted framing syntax requires that the "message size" header be terminated by a SPACE ( ) character, but rsyslog found a dash ( -) instead:

2018-04
    ^ here

Unfortunately, rsyslog continues with the assumption that this is an octet-counted message frame, and it reads 2018 bytes of data and uses that as the message. The next message is assumed to start at byte number 2019 and the parser will begain again at that offset into the message stream. Depending on the actual log data, this can cause what appears to be wildly erratic parsing behavior. For example, the log file that rsyslog writes might look something like this:

#Remark: 1234567890 "bluecoat01.mydomain.co" "192.168.1.5" "Main_SYSL"#015
12-04 18:45:08 43 192.168.1.205 304 TCP_HIT 707 1313 GET https www.example.com 443 <... truncated ...>
"Mozi
lla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome <... truncated ...>

This results in the Blue Coat logs getting completely garbled up in a way that they are completely useless to Splunk.

The solution to this problem is to disable Octet Counted Framing support in your TCP listener(s) which need to accept BlueCoat access logs. This can be done by adding one or more of the following config options to your rsyslog.conf file:

## rsyslog.conf

# if using the "imtcp" module
$InputTCPSupportOctetCountedFraming off

# if using the "imptcp" module (Plain TCP)
$InputPTCPSupportOctetCountedFraming off

# -- OR -- (new-style RainerScript syntax)

# if using the "imtcp" module
input(type="imtcp" port="<TCP_PORT>" ruleset="<RuleSet_Name>" supportOctetCountedFraming="off")

# if using the "imptcp" module (Plain TCP)
input(type="imptcp" port="<TCP_PORT>" ruleset="<RuleSet_Name>" supportOctetCountedFraming="off")

Final note: if you actually have syslog clients that are properly employing the "octet-counted framing" syntax, you will need to deploy at least two TCP listeners; one for octet-counted framing, and one without. Otherwise the logs from your "octet-counted framing devices" will get garbled. This could be achieved with a configuration as follows:

## rsyslog-example.conf

# syslog clients that won't break rsyslog's "octet-counted framing"
# implementation may send messages to the default syslog port (514).
input(type="imtcp" port="514" ruleset="MyTCPMessageRuleSet")

# BlueCoat access logs and other clients that might break rsyslog's
# "octet-counted framing" implementation must send messages to a custom TCP
# port (5141).
input(type="imtcp" port="5141" ruleset="MyTCPMessageRuleSet" supportOctetCountedFraming="off")

fairje
Communicator

If you can actually get Bluecoat to fix their product that would be great. I actually moved on from that project and the issue was never solved before I left. However, if you do manage to get them to fix it on their end, I'll make sure to pass that along to my contacts over there so they can fix their poor bluecoat stuff. 2 years later... it's still broken 😕

0 Karma

millerhack
Engager

@fairje, we don't plan on asking Blue Coat to change anything (that would be great, but probably fruitless). However, it does appear that admins can create their own "access log" formats in the ProxySG config.

See page 4; steps 7-12 in this PDF: https://www.cisco.com/c/dam/en/us/td/docs/security/web_security/scancenter/deployment/guide/Configur...

It also appears to be possible via the ProxySG commandline interface. Info on how to edit a format on page 108 in this PDF: https://origin-symwisedownload.symantec.com/library/SYMWISE/ENTERPRISE/sgos_command_line_interface_r...

0 Karma

fairje
Communicator

Wow, so BlueCoat and Rsyslog just really do not like playing well together. This is mostly BlueCoat's fault with the way that they are dumping the data, and I almost feel like I need to make this an FTP dump instead... Not sure if that will actually help the problem or not. But that comes with its own set of problems, so I am avoiding that for now.

As I sort through this, since noone else seems to have had this issue, I will continue to update this "Answer" in the hopes that the end state will help someone else sorting through this.

First, to fix breaking of rsyslog when it try to roll a file, I updated my template string to look like this:

$template bluecoatlog, "/opt/log/%fromhost%/%$year%-%$month%-%$day%-%$hour%-syslog.log"

This basically is making a new file every hour that the data is getting dumped into. the convenience here is that as logrotate compresses the file every hour it then is able to start making a new file because it is a new hour. The problem is that I have effectively broke the cycling of files through logrotate because it now thinks each hour is a new file and doesn't dump after 3 cycles.

I have overcome this issue by creating the following script under my /etc/cron.daily/ folder:

#!/bin/sh

DATE=$(date --date="2 days ago" +"%Y-%m-%d")

find /opt/log/ -type f -name $DATE* -exec rm {} \;

This will now remove anything that was created 2 days ago which will allow me to keep 48-72 hours of logs depending on the time of day you look. I may kick this out to 3 days ago just to ensure coverage over the entire weekend, in case something else breaks elsewhere.

The second thing I have found out is that the lack of parsing is tied to the way that rsyslog is picking up the line break character. Rather than inserting a character into the message, it is inserting #015# as ASCII. So I am going to end up changing the props for indexing time to ensure that it breaks each event on that sequence, instead of breaking on a Line Feed character.

Lastly, BlueCoat is not sending syslog messages in proper RFC format. I pointed this out in my original message, but didn't realize the problems this would cause. Because of this, any message that starts with a "z" character will be picked up as a "compressed" message, try to decompress it, and fail. It then just discards the message and moves on. This is happening about every 2 - 3 minutes I get an unlucky message like this. The "Fix" for this is going to be figuring out how to inject a

<PRI>

code into the message at the front end. This as it turns out is what Rsyslog is looking for as it decodes the messages coming in. It is either that, or just disable compression detection, I'll try to figure out which is easier. I'm not using compression anyway, so either solution should work.

0 Karma

TonyLeeVT
Builder

Did you find a fix for the rsyslog #015# issue? If not, what did you add to props.conf to line break properly?

Apparently, rsyslog does this intentionally, but it can be disabled. They are global directives called:

parser.escapeControlCharactersOnReceive
Escape8BitCharactersOnReceive

Description:

This directive instructs rsyslogd to replace control characters during reception of the message. The intent is to provide a way to stop non-printable messages from entering the syslog system as whole. If this option is turned on, all control-characters are converted to a 3-digit octal number and be prefixed with the parser.controlCharacterEscapePrefix character (being '#' by default). For example, if the BEL character (ctrl-g) is included in the message, it would be converted to "#007". To be compatible to sysklogd, this option must be turned on.

Warning:

turning on this option most probably destroys non-western character sets (like Japanese, Chinese and Korean)
turning on this option destroys digital signatures if such exists inside the message
if turned on, the drop-cc, space-cc and escape-cc property replacer options do not work as expected because control characters are already removed upon message reception. If you intend to use these property replacer options, you must turn off $EscapeControlCharactersOnReceive.

http://www.rsyslog.com/doc/rsconf1_escapecontrolcharactersonreceive.html
http://www.rsyslog.com/doc/rsconf1_escape8bitcharsonreceive.html

Thank you for your efforts and level of detail.

fairje
Communicator

Thanks, I never got around to updating my post with that information. I did find that and put it in there, but as I got held up on the 3rd issue (BlueCoat not formatting the logs per RFC), I didn't post it as a solution because it only partially fixes the issue.

You still get messages starting with the "z" character and that makes it think the message is compressed and so it tries to expand that message and fails and then just dumps the entire log out.

I ended up posting the question over on the official rsyslog forums and I didn't really get anywhere with that either just yet. They suggested either writing the logs somewhere else first and then pulling those logs via rsyslog's imfile module or "mmnormalize module to parse the log lines". I don't think imfile is an option for my case (if I'm going to go through that trouble, I'll just switch to FTP and have splunk monitor that file instead) and I haven't had a chance yet to research into "mmnormalize" as I have been busy with some other issues.

At some point I'll have to circle back around on this and solve it. If I do before someone else posts a solution I'll be sure to update my answer!

If you want to see the little bit of back and forth on the rsyslog forums please check out this:
http://kb.monitorware.com/bluecoat-and-rsyslog-t12738.html

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...