Getting Data In

How to remove binary data from the event in files on a splunk forwarder?

nsommars
Explorer

Hi!
On a Splunk forwarder (universal) some of the files monitored contain binary data that we do not want to send to the indexers.
It seems impossible to prevent the logging applications on the server from logging these binary parts, so the data is on a Splunk forwarder monitored log on the server.

The problem is that the binary data is within an event, meaning that the file itself is not binary.

Is there any way to use the props.conf directive NO_BINARY_CHECK on these files, or does that only apply to binary files, and not textfiles containing binary sections?

What would be the best way to remove the binary parts from the event before forwarding it to the indexers?

When the data enters the indexers it can be removed with SEDCMD, but to save bandwidth, and possibly indexing license, it would be nice if the binary part could be removed before it enters the indexers.

Any ideas?

0 Karma
1 Solution

nsommars
Explorer

Just in case someone has a similar problem:
Since an intermediate heavy forwarder was not an option right now, I added a SEDCMD regexp in props.conf on the indexer servers, under the sourcetype stanza that these files belong to.
The regexp itself will of course vary depending on how/what the application server logs, but in my case there was a Content-Type: application/octet-stream that started the binary part. the binary then continued to the end of the event which gave:

SEDCMD-filterbinary = s/(?ms)(?<=Content-Type: application\/octet-stream)(.*$)/\n**Removed binary part**/g

which leaves the Content-Type: application\/octet-stream row in the log and replaces the rest with **Removed binary part**

Your mileage may vary! 😉

View solution in original post

nsommars
Explorer

Just in case someone has a similar problem:
Since an intermediate heavy forwarder was not an option right now, I added a SEDCMD regexp in props.conf on the indexer servers, under the sourcetype stanza that these files belong to.
The regexp itself will of course vary depending on how/what the application server logs, but in my case there was a Content-Type: application/octet-stream that started the binary part. the binary then continued to the end of the event which gave:

SEDCMD-filterbinary = s/(?ms)(?<=Content-Type: application\/octet-stream)(.*$)/\n**Removed binary part**/g

which leaves the Content-Type: application\/octet-stream row in the log and replaces the rest with **Removed binary part**

Your mileage may vary! 😉

nickhills
Ultra Champion

Glad you sorted it, and thanks for posting back your solution!

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

SEDCMD occurs before indexing, so it wont come off your licence limit, however if your using a UF it will still be sent across the network.

You could consider installing a heavy forwarder (either directly on the src system or as an intermediate forwarder) The HF can do the preprocessing which relives your indexers from the workload

If my comment helps, please give it a thumbs up!
0 Karma

nsommars
Explorer

Thanks!
I was a bit confused regarding SEDCMD and where it was applicable. It obviously belongs to props.conf which is parsed by UF...but apparently UF does not support all methods available in props.conf

0 Karma

nickhills
Ultra Champion

That's more or less correct...
(and it is a bit confusing)

A UF can 'filter' (event routing, black/whitelisting etc)
An HF can 'filter' AND 'pre-process' (transform, sed, re-write)
both can be configured in props/transforms/inputs/outputs.

The difference is that a UF is supposed to be lightweight with small footprint, so features are limited. Heavy Forwarders as the name implies can do a bit more.

If my comment helps, please give it a thumbs up!
0 Karma

493669
Super Champion

Have you tried this in props.conf

[sourcetype_name]
NO_BINARY_CHECK = true

NO_BINARY_CHECK = [true|false]
* When set to true, Splunk processes binary files.
* Can only be used on the basis of [], or [source::],
not [host::].
* Defaults to false (binary files are ignored).
* This setting applies at input time, when data is first read by Splunk.
The setting is used on a Splunk system that has configured inputs
acquiring the data.

0 Karma

nsommars
Explorer

Well, the file is not binary, and I don't want to process binary data...I wan't do get rid of it 😉

0 Karma

493669
Super Champion

if there is any specific data to remove from events then you can use SEDCMD and write regex
have a look at https://answers.splunk.com/answers/83790/how-do-i-remove-x00-characters-from-my-log-message.html

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...