All Apps and Add-ons

Could Splunk ingestion proto buff msg via HEC endpoint?

charival
Observer

Hi Gurus,

 Greetings. Please advise whether Splunk HEC endpoint can ingest proto buff msgs? Parse the pb message(using configured schema) and convert it into format that is compatible with index and search heads.

If already there is app for it, please point me to it. If not please share the app sdk doc, that let me to add custom logic before indexing, if applicable.

Thanks in advance!

Labels (1)
0 Karma

tscroggins
Influencer

Hi,

While protobuf binary files can be processed (single-threaded) as archives using a monitor input and appropriate props.conf settings (invalid_cause, is_valid, and unarchive_cmd), HEC inputs send events directly to parsing queues.

As a proof of concept outside Splunk, you can proxy protobuf messages to a HEC input using ncat and a conversion script.

For example, given a Thing prototype:

// Thing.proto

syntax = "proto2";

package Thing;

message Thing {
  required string timestamp = 1;
  required string foo = 2;
  required bool baz = 3;
  required int32 qux = 4;
}

we can compile Python bindings:

protoc --python_out=. ./Thing.proto

to produce Thing_pb2.py.

We can then write a short script to read protobuf binary input from stdin and write JSON output to a HEC receiver:

#!/bin/python
# $SPLUNK_HOME/bin/scripts/protobuf_to_hec.py

import json
import ssl
import sys
import Thing_pb2
import urllib.request

from google.protobuf.json_format import MessageToJson

message = Thing_pb2.Thing()
message.ParseFromString(sys.stdin.buffer.read())

event = json.dumps(json.loads(MessageToJson(message, including_default_value_fields=True, preserving_proto_field_name=True)))

req = urllib.request.Request(url="https://localhost:8088/services/collector/raw")
req.data = event.encode("utf-8")
req.methed = "POST"
# example token value
req.add_header("Authorization", "Splunk b6708bbf-97e5-4065-a27d-42a37a812020")
req.add_header("Content-Type", "application/json")

# don't do this in production
ctx = ssl.SSLContext()

urllib.request.urlopen(url=req, context=ctx)

For testing, our inputs and props are fairly simple. You'll want to modify timestamp extraction etc. as needed.

# inputs.conf

[http]
disabled = 0

[http://protobuf]
sourcetype = protobuf_json
# example token value
token = b6708bbf-97e5-4065-a27d-42a37a812020

# props.conf

[protobuf_json]
DATETIME_CONFIG = CURRENT
LINE_BREAKER = ([\r\n]+)\{
SHOULD_LINEMERGE = false

Finally, tie everything together with ncat to listen for incoming protobuf messages:

ncat --listen --keep-open --sh-exec "$SPLUNK_HOME/bin/scripts/protobuf_to_hec.py" 18088 &

"Thing" messages sent to 18088/tcp will be converted to JSON and posted to the HEC input. You can wrap ncat in a script and create a scripted input with an interval value of 0 to start ncat at Splunk startup and automatically restart the input if ncat exits.

tscroggins_0-1680927472709.png

For production use, this could be re-written as a Python-based modular input that replaces ncat with a robust protobuf server implementation and the HEC input with native Splunk SDK functions.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...