I am writing a custom streaming search command using the Python SDK and the V2 Protocol. I followed the V2 protocol documentation closely, and have created a streaming command that should play nicely with distributed search... but for some reason, it is not being distributed to indexers. The "remoteSearch" field in the job inspector confirms this. Search head and indexers are running Splunk 6.5.
The command is declared as a streaming command in its .py file according to this convention:
@Configuration(distributed=True, local=False)
class testCommand(StreamingCommand):
def stream(self, events):
#transform each event in the chunk
for event in events:
...
yield event
dispatch(ipasnCommand, sys.argv, sys.stdin, sys.stdout, __name__)
And the commands.conf looks like this:
[test_command]
filename = test_command.py
chunked = true
distributed = true
local = false
I realize that streaming commands should distribute to indexers automatically, but as soon as I noticed that wasn't happening, I started specifying these configuration parameters explicitly just to see if that made a difference. So far, it has not.
Looking at the "remoteSearch" field in the job inspector shows that this command is not ever being offloaded to the indexer peers... and the poor performance I am seeing speaks to that as well. Any thoughts on why that might be? Any configuration step I might have missed to make it distribute effectively?
Note that the command is deployed to the search head and indexers in the same manner, and that it can be run manually on each of the indexers as well as search head. In my search string, the command is simply being written to replace an existing distributed search command that does work, so its position within my particular search string should not be a problem.
Did you ever figure this out? I have my command working in one Splunk instance and not another. Both are version 6.6.3.
The error I get during search time is "Search Factory: Unknown search command" from each indexer. Setting local = true works in the commands.conf but I don't want that as well.
My questions: https://answers.splunk.com/answers/586866/unknown-search-command-ldaptestconnection-when-con.html
Could this possibly be the reason:
@Configuration(distributed=True, local=False)
I built a custom GeneratingCommand recently; perhaps try the following configuration?
@Configuration(type='streaming', distributed=True)
or
@Configuration(streaming=True, local=False)
These are different configurations for the same outcome on different SCP versions -- more on this documentation page: http://docs.splunk.com/DocumentationStatic/PythonSDK/1.6.0/searchcommands.html#splunklib.searchcomma...
I have tried passing several combinations of arguments to the Configuration decorator, including those that you mention, but so far no luck.
For Streaming commands, both 'type' and 'streaming' are actually fixed parameters (as defined in splunklib/searchcommands/streaming_command.py). So attempting to specify them in the Configuration decorator throws an error.
I've also tried leaving @Configuration() empty, since splunklib's streaming_command.py indicates that 'streaming' should be fixed to True, and 'distributed' defaults to True in the V2 protocol. 'local' also defaults to False.
So in theory, no extra configuration should be needed to get this to distribute to indexers. Any other thoughts?
Yikes, that's about the end of my experience and you're right; I realized that those attributes applied to the GeneratingCommand class yesterday evening...
The command I built recently wouldn't get off the indexers until I changed the stream to 'reporting' and I did not want it distributed since it calls back to our ITSI kvstore.
I looked through generating_command.py and it looks like declaring your command as a GeneratingCommand that is configured to operate in the streams pipeline may work? If that's not possible or doesn't work I got nothing 😕
Configuring it as a GeneratingCommand in the streams pipeline was an interesting idea, but it looks like it won't work for my use case. The command needs to perform ASN lookups on an existing set of events, but GeneratingCommands "must be the first command of a search." I did rewrite it as a GeneratingCommand just to see what would happen, but Splunk pitched that error in response.
Do you have any ideas about how we might get this issue more visibility? I think it's worthy of bringing up to some Splunk dev's -- if there are widespread distributing issues with streaming commands using splunklib, it's definitely something that should be addressed.
What search are you using to test your command, is it the only command used after the event search? Can you share?
Distributable streaming commands may be prevented from running remotely, depending on search command order, i.e. they have to be placed before non-streaming commands.
I assume you have the command defined on the indexers as well as the search head?
Thanks for the response! I can't share the exact command syntax, but trust that it is placed before all non-streaming commands. In fact, this custom command is being used in place of an older distributed steaming command (that does distribute correctly within the same larger search query, but is less efficient). The command is indeed defined on all indexers and deployed in the same manner to the search head and all indexers. I also verified that it can be run manually on each of the indexers.
OK, that's probably where my expertise ends... 😉
The only thing I noticed is that the current spec file for commands.conf does not contain the distributed=true|false option and there is also no default in etc/system/local. Hmmm...
I am not sure that is relevant. I'll see if I can dig up something else for you.
Okay thanks for continuing to look! I'm quite puzzled by why it doesn't work yet. But I'm definitely not seeing it in remoteSearch, and the performance is exactly as bad as it would be running on just the search head.