On a Splunk forwarder (universal) some of the files monitored contain binary data that we do not want to send to the indexers.
It seems impossible to prevent the logging applications on the server from logging these binary parts, so the data is on a Splunk forwarder monitored log on the server.
The problem is that the binary data is within an event, meaning that the file itself is not binary.
Is there any way to use the
NO_BINARY_CHECK on these files, or does that only apply to binary files, and not textfiles containing binary sections?
What would be the best way to remove the binary parts from the event before forwarding it to the indexers?
When the data enters the indexers it can be removed with SEDCMD, but to save bandwidth, and possibly indexing license, it would be nice if the binary part could be removed before it enters the indexers.
Have you tried this in props.conf
[sourcetype_name] NO_BINARY_CHECK = true
NOBINARYCHECK = [true|false]
* When set to true, Splunk processes binary files.
* Can only be used on the basis of [
Well, the file is not binary, and I don't want to process binary data...I wan't do get rid of it 😉
if there is any specific data to remove from events then you can use SEDCMD and write regex
have a look at https://answers.splunk.com/answers/83790/how-do-i-remove-x00-characters-from-my-log-message.html
SEDCMD occurs before indexing, so it wont come off your licence limit, however if your using a UF it will still be sent across the network.
You could consider installing a heavy forwarder (either directly on the src system or as an intermediate forwarder) The HF can do the preprocessing which relives your indexers from the workload
I was a bit confused regarding SEDCMD and where it was applicable. It obviously belongs to props.conf which is parsed by UF...but apparently UF does not support all methods available in props.conf
That's more or less correct...
(and it is a bit confusing)
A UF can 'filter' (event routing, black/whitelisting etc)
An HF can 'filter' AND 'pre-process' (transform, sed, re-write)
both can be configured in props/transforms/inputs/outputs.
The difference is that a UF is supposed to be lightweight with small footprint, so features are limited. Heavy Forwarders as the name implies can do a bit more.
Just in case someone has a similar problem:
Since an intermediate heavy forwarder was not an option right now, I added a SEDCMD regexp in props.conf on the indexer servers, under the sourcetype stanza that these files belong to.
The regexp itself will of course vary depending on how/what the application server logs, but in my case there was a Content-Type: application/octet-stream that started the binary part. the binary then continued to the end of the event which gave:
SEDCMD-filterbinary = s/(?ms)(?<=Content-Type: application\/octet-stream)(.*$)/\n**Removed binary part**/g
which leaves the Content-Type: application\/octet-stream row in the log and replaces the rest with
**Removed binary part**
Your mileage may vary! 😉