Solved: Re: Why doesn't this custom search command call cl...

plucas_splunk · ‎11-26-2016

Given an excerpt from custom search command:

logger = logging.getLogger( 'nbclosest' )
logger.setLevel( logging.DEBUG )

K_STAG  = 'stop_tag'
K_TIME  = '_time'
K_VDIST = 'vehicle_distance'
K_VID   = 'vehicle_id'

@Configuration()
class NextBusClosestStop( EventingCommand ):
    class ConfigurationSettings( EventingCommand.ConfigurationSettings ):
        required_fields = ConfigurationSetting(value=[K_TIME, K_VID, K_VDIST, K_STAG])

    def __init__( self ):
        super( NextBusClosestStop, self ).__init__()
        # ...

    def drain( self ):
        logger.debug( 'enter drain()' )
        # do drain code

    def transform( self, records ):
        logger.debug( 'enter transform()' )
        for rec in records:
            # ...
            yield rec

        logger.debug( 'exit transform()' )
        self.drain()

The transform() function is called and both enter transform() and exit transform() are in search.log, but I never see enter drain() logged --- and the code is indeed never called (because the results produced are wrong).

However, if I copy & paste the code from drain() and put it "inline" in place of self.drain(), then the code executes.

How can it be the case that self.drain() isn't called?

plucas_splunk · ‎11-26-2016

After doing some more reading on yield, it turns out that putting yield into a sub-function turns that function into a generator and, in order to get results out of it, one has to iterate over that generator:

    def drain( self ):
        recs = self.vdict.values()
        for rec in sorted( recs, key=operator.itemgetter( K_TIME ) ):
            yield rec
        self.vdict.clear()

Then to call it:

            for rec in self.drain():
                yield rec

In Python >= 3.3, one can instead do:

            yield from self.drain()

but Splunk currently ships with Python 2.7.11.

View solution in original post

plucas_splunk · ‎11-26-2016

After doing some more reading on yield, it turns out that putting yield into a sub-function turns that function into a generator and, in order to get results out of it, one has to iterate over that generator:

    def drain( self ):
        recs = self.vdict.values()
        for rec in sorted( recs, key=operator.itemgetter( K_TIME ) ):
            yield rec
        self.vdict.clear()

Then to call it:

            for rec in self.drain():
                yield rec

In Python >= 3.3, one can instead do:

            yield from self.drain()

but Splunk currently ships with Python 2.7.11.

jkat54 · ‎11-26-2016

looks like self.drain() is called from within self.transform()?

I see you yield results in a for loop too. Doesn't that cause the transform function to yield and exit when records exist?

I think you can fix by calling self.drain() after you use "for x in self.transform():" instead of wrapping it up inside of self.transform()

jkat54 · ‎11-26-2016

Can you share your code or is this it?

https://github.com/paul-j-lucas/nextbus-util/blob/master/splunk/etc/apps/search/bin/nbclosest.py

plucas_splunk · ‎11-26-2016

Yes, that's the code.

jkat54 · ‎11-26-2016

I went for help and had it carefully pointed out to me that the exit transfor log happens even though it's outside of the loop too. This is very interesting indeed.

plucas_splunk · ‎11-26-2016

Yes, that's the part that is most confusing.

jkat54 · ‎11-26-2016

http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do

I'm not sure what you mean by inline but my statement still holds some water. Yield returns a generator object and Python will iterate through the function until it hits the yield. As long as there is a records array to be had, you'll never see self.drain called. When you call self.transform without a records array self.drain will happen because the "for rec in records:" doesn't execute and this the yield doesn't take place.

plucas_splunk · ‎11-26-2016

By "inline" I mean that instead of calling drain(), I copy the code from drain() and paste a copy of it to where I call drain().

plucas_splunk · ‎11-26-2016

looks like self.drain() is called from within self.transform()?

Yes.

I see you yield results in a for loop too.

Yes, as does every example I've ever seen.

Doesn't that cause the transform function to yield and exit when records exist?

AFAIK, transform() is called with multiple records that Splunk sends in "chunks" (which is why this is called a "Chunked External Processor" in version 2 of the Python SDK). The transform() function then iterates over the records doing something with them. For those it wishes to return to Splunk, it calls yield. However, despite its name, I doubt yield actually yields control because then --- somehow --- the for loop would have to pick up from where it left off the next time transform() is called. Hence, I believe yield is probably closer to "print."

I think you can fix by calling self.drain() after you use "for x in self.transform():" instead of wrapping it up inside of self.transform()

But the way the API works is that transform() is called by Splunk --- you do not call it yourself.

And this doesn't explain why "inline" code would be executed while the call to the function would not.

Why doesn't this custom search command call class method?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!