Splunk Search

Chunked=True SmartStreamingCommand to support large data sets - requires change to internals.py - remove flush at maxresultrows

onthebay
Path Finder

To support large dataset (1mil + rows) using custom commands and Chunked=true

I implemented SmartStreamingCommand per: How come custom search commands (CSC) SCPv2 cannot handle large event sets

To resolve an issue where I used a SCPv2 generating command to feed a SCPv2 streaming command dropping records:
I removed the partial=True flush in internals.py for SmartStreamingCommand to work consistently.

Deleted this from internals.py for use in chunked commands :

    if self._record_count >= self._maxresultrows:
        self.flush(partial=True)

See:
- https://github.com/splunk/splunk-sdk-python/pull/251
- https://github.com/splunk/splunk-sdk-python/issues/150

What are the ramifications of this change

Tags (1)
0 Karma

kulick
Path Finder

I have attempted to handle this issue in a more recent change to the original patch.

Details here: https://github.com/splunk/splunk-sdk-python/compare/master...TiVo:large-scale-custom-cmds

Let me know if that resolves the issues you were seeing. Good luck! 🙂

0 Karma

onthebay
Path Finder

When I remove the partial=true flush I no longer need the smartstreaming command to process records. All records sent by splunk are available to the custom command and returned to splunk for the next spl. When you have a moment could you try this approach and see if I am missing something?

Smartstreaming only worked if the spl pipeline feeding it included no other SCPv2 custom commands

0 Karma

kulick
Path Finder

Sorry, I didn't originally see your follow up here.

I think that the underlying bug that I am working around in the patch referenced above produces different behaviors depending on many different factors (eg. custom command input event to output event ratio (does it turn each input event into multiple output events?), Splunk architecture (are there many indexers serving the query or just one?), the SPL structure (is the SPL command starting from a generating command or from a real search?), etc.) Based on all of these differences, the bug may or may not be triggered.

I didn't completely follow your request here, but I do not believe that simply adding or removing a flush() command at the right time will work around the current Splunk daemon bug. Instead, the custom command must carefully manage the timing of how it collects and returns information to its Splunk parent process to avoid the bug...

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Upgrade Prep for 10.4, Network Observability Deep Dives, and More from Splunk Lantern

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...

Splunk Developer Day announcements: AI agents, MCP tools, Forecasting, and Custom ...

Splunk Developer Day was packed with product and platform updates for developers building in the AI ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...