All Apps and Add-ons

How to handle changing log file formats - BlueCoat?

kristian_kolb
Ultra Champion

The infrastructure department of an enterprise would like to index logs from several BlueCoat proxies in Splunk, but they have little or no control of these proxies and what type of logging is enabled (i.e. what fields are being logged). What they would like to know, is how Splunk would be able to deal with BlueCoat logs that may change without any notice in advance.

There are several BC's within the enterprise, managed by people in different departments/business areas. These people may change the log file format as they see fit, without telling anyone. Currently logs are continuously being sent to a central location (via syslog I believe). From what I've been told, if looking at the data as it comes in, the change of format "just happens", and after a little while there is a message stating what the new log file format is (this seems to be a BlueCoat flaw).

From a Splunk point-of-view this is not good, since field extractions will not work well, if at all.

What would be the best course of action in order to get the BC logs splunkified?

  • Enforce a single log format throughout the organization - which may be politically hard to do.
  • Establish a couple of 'approved' log file formats, and set up corresponding syslog receivers; i.e. if you change from approved-format-1 to approved-format-2, then you also must change the destination to which logs are sent. The files created by the different syslog receivers would then rather easily be mapped to different sourcetypes. Until someone forgets to follow procedure.
  • Something else, any ideas welcome 🙂

Thanks in advance,

Kristian

Tags (2)

wbfoxii
Communicator

I have a similar problem, just with inertia in the operations department. I ended up creating different event types for each format I encountered. I'm up to five. In our environment, all of our logs are first ftp'ed to BlueCoat Reporter. I can pick them up from there with a forwarder. I did a HEAD on the first 6 lines and the field order is provided there.

For each different one, I created an eventtype (LogFormat_A, LogFormat_B, etc.) based on the IP address of the proxy. They are in eventtypes.conf

Then I created new stanzas in transforms.conf for each of the new events and moved around the fields in the REGEX to match the format.

Finally, in props.conf, I added all of the new LogFormat_x events to the REPORT-main statement.

that got me consistent parsing across the different formats.

When the Action is TCP_TUNNELED, it appears that the cs(Referer) field is left null. That causes the REGEX to fail because the delimiter is then two successive blanks and the field is not there. Don't know where to pursue that with BlueCoat.

0 Karma

kristian_kolb
Ultra Champion

Dart; Well, yes, I'm quite aware of that, but since I think we would be dealing with data coming in over syslog, it'd have to be done like I described in the question above; one syslog server (instance) per log format.

If you have 2 different log formats, say Basic and Full, you set up 2 syslog receivers, say udp/514 and udp/515. Then you configure your BCs to send to 514 when logging Basic, and to 515 when logging Full.

The syslog server(s) write files to different directory structures, and the forwarder(s) can easily set the sourcetype based on which file(s) it reads.

Any other ideas?

0 Karma

dart
Splunk Employee
Splunk Employee

Because if you could get files with the header, you could use:
http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime

0 Karma

dart
Splunk Employee
Splunk Employee

Can you get the logs as files, with the headers, instead of syslog streamed?

0 Karma

kristian_kolb
Ultra Champion

I was under the impression that BC normally logs like most web servers, i.e. in a csv or tsv format. However, somebody suggested that it's possible to configure BC to log with key=value. Could someone confirm?

BR, Kristian

0 Karma

Takajian
Builder

Could you explain why you need to extract field in advance? As you know, we can search certain data with any keyword like ip and url and so on.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...