All Apps and Add-ons

Protocol Data Inputs: How to process a stream of events over TCP before indexing them in Splunk?

Path Finder

I’m wanting to use the "protocol data inputs" add-on to receive a stream of events over TCP and pre-process them before indexing them in Splunk. I assume that’s something that I should be able to do?

My apologies for a rookie question, but I wonder if someone can help me please?
- I’ve written a message handler, and I’m successfully receiving the data as it is streamed over TCP.
- However, often the message buffer I receive contains partial events – e.g the start of an event at the end of the buffer, with the remainder of the event coming the next time that the message handler is invoked.

I assume that’s expected and normal? Is there any easy way to reassemble the partial messages / any sample implementation that I could adapt? Presumably I need to save off the partial message and wait for the remainder to be received… Is there an easy way to persist data between invocations of the handler?

Path Finder

Hi everyone,
I have a trouble with PDI. I can't configure data input through Websocket. Can you help me?
alt text

alt text

0 Karma

New Member

I have same question...

0 Karma

Ultra Champion

Your assumption is correct. That's one of the main design goals of Protocol Data Inputs , to be able to pre process data via pluggable polyglot(numerous programming languages) data handlers.

Are you sending large TCP data windows ? Have you tried increasing the TCP receive buffer size ?

alt text

0 Karma

Explorer

How do you get to those socket settings?

0 Karma

Ultra Champion

They are on the Data Inputs setup page for the Protocol Data Inputs stanza you are setting up.

0 Karma

Ultra Champion

Also , did you increase the SEND buffer on the client side to match the enlarged RECEIVE buffer in your Protocol Data Inputs setup on the server side ?

0 Karma

Path Finder

On the send side, the client opens a TCP connection to a port on Splunk and then writes a near continuous stream of events to Splunk. The events are variable length -- anywhere between 10 and 4000 bytes long and delimited by a specific set of characters. I don't have any control over the send buffer size.

On the receive side, as I understand it the TCP layer has no way of knowing what constitutes an event (?) -- presumably when the receive buffer fills up it will pass it on up the stack and it may contain 1 event, many events, or (likely) a number of whole events + partial events at the ends of the buffer?

Unless I'm missing something basic, it seems to me that I just need to manually handle reassembling events fragmented across consecutive receive buffers... Is that not true?

0 Karma

Ultra Champion

Without knowing or being able to control exactly how the client is sending data , such as TCP settings and Socket behaviour like flushing the stream at the demarkation points of each event or set of events, then you are going to have to implement some logic in your handler. As you correctly put it , the PDI TCP Handler does not know about events, it just receives raw bytes and then passes these along to your custom Data handler for pre processing.

0 Karma

Path Finder

Thanks Damien. Unfortunately I have tried increasing the receive buffer size and that didn't make any difference. In the general case though is it not always the case that I'll need to deal with fragmented events, or are you saying that it should be possible to avoid this issue for some reason?

0 Karma

Ultra Champion

Can you elaborate on what a "fragmented event" constitutes. Are they over a particular byte size for example ? Any specific details are helpful , I'm flying blind here.

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!