Splunk Dev

Why doesn't this custom search command call class method?

plucas_splunk
Splunk Employee
Splunk Employee

Given an excerpt from custom search command:

logger = logging.getLogger( 'nbclosest' )
logger.setLevel( logging.DEBUG )

K_STAG  = 'stop_tag'
K_TIME  = '_time'
K_VDIST = 'vehicle_distance'
K_VID   = 'vehicle_id'

@Configuration()
class NextBusClosestStop( EventingCommand ):
    class ConfigurationSettings( EventingCommand.ConfigurationSettings ):
        required_fields = ConfigurationSetting(value=[K_TIME, K_VID, K_VDIST, K_STAG])

    def __init__( self ):
        super( NextBusClosestStop, self ).__init__()
        # ...

    def drain( self ):
        logger.debug( 'enter drain()' )
        # do drain code

    def transform( self, records ):
        logger.debug( 'enter transform()' )
        for rec in records:
            # ...
            yield rec

        logger.debug( 'exit transform()' )
        self.drain()

The transform() function is called and both enter transform() and exit transform() are in search.log, but I never see enter drain() logged --- and the code is indeed never called (because the results produced are wrong).

However, if I copy & paste the code from drain() and put it "inline" in place of self.drain(), then the code executes.

How can it be the case that self.drain() isn't called?

1 Solution

plucas_splunk
Splunk Employee
Splunk Employee

After doing some more reading on yield, it turns out that putting yield into a sub-function turns that function into a generator and, in order to get results out of it, one has to iterate over that generator:

    def drain( self ):
        recs = self.vdict.values()
        for rec in sorted( recs, key=operator.itemgetter( K_TIME ) ):
            yield rec
        self.vdict.clear()

Then to call it:

            for rec in self.drain():
                yield rec

In Python >= 3.3, one can instead do:

            yield from self.drain()

but Splunk currently ships with Python 2.7.11.

View solution in original post

0 Karma

plucas_splunk
Splunk Employee
Splunk Employee

After doing some more reading on yield, it turns out that putting yield into a sub-function turns that function into a generator and, in order to get results out of it, one has to iterate over that generator:

    def drain( self ):
        recs = self.vdict.values()
        for rec in sorted( recs, key=operator.itemgetter( K_TIME ) ):
            yield rec
        self.vdict.clear()

Then to call it:

            for rec in self.drain():
                yield rec

In Python >= 3.3, one can instead do:

            yield from self.drain()

but Splunk currently ships with Python 2.7.11.

0 Karma

jkat54
SplunkTrust
SplunkTrust

looks like self.drain() is called from within self.transform()?

I see you yield results in a for loop too. Doesn't that cause the transform function to yield and exit when records exist?

I think you can fix by calling self.drain() after you use "for x in self.transform():" instead of wrapping it up inside of self.transform()

0 Karma

jkat54
SplunkTrust
SplunkTrust
0 Karma

plucas_splunk
Splunk Employee
Splunk Employee

Yes, that's the code.

0 Karma

jkat54
SplunkTrust
SplunkTrust

I went for help and had it carefully pointed out to me that the exit transfor log happens even though it's outside of the loop too. This is very interesting indeed.

0 Karma

plucas_splunk
Splunk Employee
Splunk Employee

Yes, that's the part that is most confusing.

0 Karma

jkat54
SplunkTrust
SplunkTrust

http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do

I'm not sure what you mean by inline but my statement still holds some water. Yield returns a generator object and Python will iterate through the function until it hits the yield. As long as there is a records array to be had, you'll never see self.drain called. When you call self.transform without a records array self.drain will happen because the "for rec in records:" doesn't execute and this the yield doesn't take place.

0 Karma

plucas_splunk
Splunk Employee
Splunk Employee

By "inline" I mean that instead of calling drain(), I copy the code from drain() and paste a copy of it to where I call drain().

0 Karma

plucas_splunk
Splunk Employee
Splunk Employee

looks like self.drain() is called from within self.transform()?

Yes.

I see you yield results in a for loop too.

Yes, as does every example I've ever seen.

Doesn't that cause the transform function to yield and exit when records exist?

AFAIK, transform() is called with multiple records that Splunk sends in "chunks" (which is why this is called a "Chunked External Processor" in version 2 of the Python SDK). The transform() function then iterates over the records doing something with them. For those it wishes to return to Splunk, it calls yield. However, despite its name, I doubt yield actually yields control because then --- somehow --- the for loop would have to pick up from where it left off the next time transform() is called. Hence, I believe yield is probably closer to "print."

I think you can fix by calling self.drain() after you use "for x in self.transform():" instead of wrapping it up inside of self.transform()

But the way the API works is that transform() is called by Splunk --- you do not call it yourself.

And this doesn't explain why "inline" code would be executed while the call to the function would not.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...