All Apps and Add-ons

Protocol Data Inputs: How to process a stream of events over TCP before indexing them in Splunk?

MatMeredith
Path Finder

I’m wanting to use the "protocol data inputs" add-on to receive a stream of events over TCP and pre-process them before indexing them in Splunk. I assume that’s something that I should be able to do?

My apologies for a rookie question, but I wonder if someone can help me please?
- I’ve written a message handler, and I’m successfully receiving the data as it is streamed over TCP.
- However, often the message buffer I receive contains partial events – e.g the start of an event at the end of the buffer, with the remainder of the event coming the next time that the message handler is invoked.

I assume that’s expected and normal? Is there any easy way to reassemble the partial messages / any sample implementation that I could adapt? Presumably I need to save off the partial message and wait for the remainder to be received… Is there an easy way to persist data between invocations of the handler?

a_salikov
Path Finder

Hi everyone,
I have a trouble with PDI. I can't configure data input through Websocket. Can you help me?
alt text

alt text

0 Karma

jingqin
New Member

I have same question...

0 Karma

Damien_Dallimor
Ultra Champion

Your assumption is correct. That's one of the main design goals of Protocol Data Inputs , to be able to pre process data via pluggable polyglot(numerous programming languages) data handlers.

Are you sending large TCP data windows ? Have you tried increasing the TCP receive buffer size ?

alt text

0 Karma

paulwrussell
Explorer

How do you get to those socket settings?

0 Karma

Damien_Dallimor
Ultra Champion

They are on the Data Inputs setup page for the Protocol Data Inputs stanza you are setting up.

0 Karma

Damien_Dallimor
Ultra Champion

Also , did you increase the SEND buffer on the client side to match the enlarged RECEIVE buffer in your Protocol Data Inputs setup on the server side ?

0 Karma

MatMeredith
Path Finder

On the send side, the client opens a TCP connection to a port on Splunk and then writes a near continuous stream of events to Splunk. The events are variable length -- anywhere between 10 and 4000 bytes long and delimited by a specific set of characters. I don't have any control over the send buffer size.

On the receive side, as I understand it the TCP layer has no way of knowing what constitutes an event (?) -- presumably when the receive buffer fills up it will pass it on up the stack and it may contain 1 event, many events, or (likely) a number of whole events + partial events at the ends of the buffer?

Unless I'm missing something basic, it seems to me that I just need to manually handle reassembling events fragmented across consecutive receive buffers... Is that not true?

0 Karma

Damien_Dallimor
Ultra Champion

Without knowing or being able to control exactly how the client is sending data , such as TCP settings and Socket behaviour like flushing the stream at the demarkation points of each event or set of events, then you are going to have to implement some logic in your handler. As you correctly put it , the PDI TCP Handler does not know about events, it just receives raw bytes and then passes these along to your custom Data handler for pre processing.

0 Karma

MatMeredith
Path Finder

Thanks Damien. Unfortunately I have tried increasing the receive buffer size and that didn't make any difference. In the general case though is it not always the case that I'll need to deal with fragmented events, or are you saying that it should be possible to avoid this issue for some reason?

0 Karma

Damien_Dallimor
Ultra Champion

Can you elaborate on what a "fragmented event" constitutes. Are they over a particular byte size for example ? Any specific details are helpful , I'm flying blind here.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...