Re: Protocol Data Inputs: How to process a stream ...

MatMeredith · ‎01-21-2015

I’m wanting to use the "protocol data inputs" add-on to receive a stream of events over TCP and pre-process them before indexing them in Splunk. I assume that’s something that I should be able to do?

My apologies for a rookie question, but I wonder if someone can help me please?
- I’ve written a message handler, and I’m successfully receiving the data as it is streamed over TCP.
- However, often the message buffer I receive contains partial events – e.g the start of an event at the end of the buffer, with the remainder of the event coming the next time that the message handler is invoked.

I assume that’s expected and normal? Is there any easy way to reassemble the partial messages / any sample implementation that I could adapt? Presumably I need to save off the partial message and wait for the remainder to be received… Is there an easy way to persist data between invocations of the handler?

a_salikov · ‎02-11-2019

Hi everyone,
I have a trouble with PDI. I can't configure data input through Websocket. Can you help me?

jingqin · ‎02-27-2019

I have same question...

Damien_Dallimor · ‎01-21-2015

Your assumption is correct. That's one of the main design goals of Protocol Data Inputs , to be able to pre process data via pluggable polyglot(numerous programming languages) data handlers.

Are you sending large TCP data windows ? Have you tried increasing the TCP receive buffer size ?

paulwrussell · ‎01-07-2016

How do you get to those socket settings?

Damien_Dallimor · ‎01-07-2016

They are on the Data Inputs setup page for the Protocol Data Inputs stanza you are setting up.

Damien_Dallimor · ‎01-21-2015

Also , did you increase the SEND buffer on the client side to match the enlarged RECEIVE buffer in your Protocol Data Inputs setup on the server side ?

MatMeredith · ‎01-21-2015

On the send side, the client opens a TCP connection to a port on Splunk and then writes a near continuous stream of events to Splunk. The events are variable length -- anywhere between 10 and 4000 bytes long and delimited by a specific set of characters. I don't have any control over the send buffer size.

On the receive side, as I understand it the TCP layer has no way of knowing what constitutes an event (?) -- presumably when the receive buffer fills up it will pass it on up the stack and it may contain 1 event, many events, or (likely) a number of whole events + partial events at the ends of the buffer?

Unless I'm missing something basic, it seems to me that I just need to manually handle reassembling events fragmented across consecutive receive buffers... Is that not true?

Damien_Dallimor · ‎01-21-2015

Without knowing or being able to control exactly how the client is sending data , such as TCP settings and Socket behaviour like flushing the stream at the demarkation points of each event or set of events, then you are going to have to implement some logic in your handler. As you correctly put it , the PDI TCP Handler does not know about events, it just receives raw bytes and then passes these along to your custom Data handler for pre processing.

MatMeredith · ‎01-21-2015

Thanks Damien. Unfortunately I have tried increasing the receive buffer size and that didn't make any difference. In the general case though is it not always the case that I'll need to deal with fragmented events, or are you saying that it should be possible to avoid this issue for some reason?

Damien_Dallimor · ‎01-21-2015

Can you elaborate on what a "fragmented event" constitutes. Are they over a particular byte size for example ? Any specific details are helpful , I'm flying blind here.

Protocol Data Inputs: How to process a stream of events over TCP before indexing them in Splunk?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!