This should be fixed in the Splunk Python SDK version 1.6.15 or later. Upgrading the SDK in your app should be enough for all supported Splunk versions. (No need to upgrade splunkd.) The problem was indeed related to the flush() method. It appears the SDK support for the chunked protocol was written assuming the protocol would support "partial chunks", allowing the response to one input message to be split into multiple output messages, with a partial: true flag used to indicate that the response would be continued on the next message. The code in the SDK to mark partial chunks had been commented out, but the code still sent partial chunks when a response produced maxresults rows (50,000 by default) -- just with no indication that it was to be interpreted partial response. This was a problem even for commands that simply returned the same number of rows, because it happened when the limit was reached, even if it was never exceeded. As a result, every time the script produced 50,000 records, the expected response was followed by an additional chunk, which -- per the protocol -- was the response to the next chunk. (The protocol expects each request to have one repsonse.) Since the script would produce responses before reading the request, as the script got more and more out of sync with the protocol, more unread requests would end up buffered in the stdin pipe with more responses buffered in the stdout pipe until both buffers were full and writes started to block/fail. I considered adding a workaround to splunkd so that apps wouldn't need to update the SDK they use, but there was no a reliable way to determine which commands needed it, or which commands would be broken by it. Anyway, if you're curious, the full fix (and a tiny bit of related clean up) is in https://github.com/splunk/splunk-sdk-python/pull/301/files Kudos to @kulick and @cpride_splunk for their early analysis of this bug!
... View more