Given an excerpt from custom search command:
logger = logging.getLogger( 'nbclosest' )
logger.setLevel( logging.DEBUG )
K_STAG = 'stop_tag'
K_TIME = '_time'
K_VDIST = 'vehicle_distance'
K_VID = 'vehicle_id'
@Configuration()
class NextBusClosestStop( EventingCommand ):
class ConfigurationSettings( EventingCommand.ConfigurationSettings ):
required_fields = ConfigurationSetting(value=[K_TIME, K_VID, K_VDIST, K_STAG])
def __init__( self ):
super( NextBusClosestStop, self ).__init__()
# ...
def drain( self ):
logger.debug( 'enter drain()' )
# do drain code
def transform( self, records ):
logger.debug( 'enter transform()' )
for rec in records:
# ...
yield rec
logger.debug( 'exit transform()' )
self.drain()
The transform()
function is called and both enter transform()
and exit transform()
are in search.log
, but I never see enter drain()
logged --- and the code is indeed never called (because the results produced are wrong).
However, if I copy & paste the code from drain()
and put it "inline" in place of self.drain()
, then the code executes.
How can it be the case that self.drain()
isn't called?
After doing some more reading on yield
, it turns out that putting yield
into a sub-function turns that function into a generator and, in order to get results out of it, one has to iterate over that generator:
def drain( self ):
recs = self.vdict.values()
for rec in sorted( recs, key=operator.itemgetter( K_TIME ) ):
yield rec
self.vdict.clear()
Then to call it:
for rec in self.drain():
yield rec
In Python >= 3.3, one can instead do:
yield from self.drain()
but Splunk currently ships with Python 2.7.11.
After doing some more reading on yield
, it turns out that putting yield
into a sub-function turns that function into a generator and, in order to get results out of it, one has to iterate over that generator:
def drain( self ):
recs = self.vdict.values()
for rec in sorted( recs, key=operator.itemgetter( K_TIME ) ):
yield rec
self.vdict.clear()
Then to call it:
for rec in self.drain():
yield rec
In Python >= 3.3, one can instead do:
yield from self.drain()
but Splunk currently ships with Python 2.7.11.
looks like self.drain() is called from within self.transform()?
I see you yield results in a for loop too. Doesn't that cause the transform function to yield and exit when records exist?
I think you can fix by calling self.drain() after you use "for x in self.transform():" instead of wrapping it up inside of self.transform()
Can you share your code or is this it?
https://github.com/paul-j-lucas/nextbus-util/blob/master/splunk/etc/apps/search/bin/nbclosest.py
Yes, that's the code.
I went for help and had it carefully pointed out to me that the exit transfor log happens even though it's outside of the loop too. This is very interesting indeed.
Yes, that's the part that is most confusing.
http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do
I'm not sure what you mean by inline but my statement still holds some water. Yield returns a generator object and Python will iterate through the function until it hits the yield. As long as there is a records array to be had, you'll never see self.drain called. When you call self.transform without a records array self.drain will happen because the "for rec in records:" doesn't execute and this the yield doesn't take place.
By "inline" I mean that instead of calling drain()
, I copy the code from drain()
and paste a copy of it to where I call drain()
.
looks like self.drain() is called from within self.transform()?
Yes.
I see you yield results in a for loop too.
Yes, as does every example I've ever seen.
Doesn't that cause the transform function to yield and exit when records exist?
AFAIK, transform()
is called with multiple records that Splunk sends in "chunks" (which is why this is called a "Chunked External Processor" in version 2 of the Python SDK). The transform()
function then iterates over the records doing something with them. For those it wishes to return to Splunk, it calls yield
. However, despite its name, I doubt yield
actually yields control because then --- somehow --- the for
loop would have to pick up from where it left off the next time transform()
is called. Hence, I believe yield
is probably closer to "print."
I think you can fix by calling self.drain() after you use "for x in self.transform():" instead of wrapping it up inside of self.transform()
But the way the API works is that transform()
is called by Splunk --- you do not call it yourself.
And this doesn't explain why "inline" code would be executed while the call to the function would not.