Splunk Search

Why doesn't the reducer get all events from the mapper function in my custom reporting command?

dcagatay
Explorer

I am trying to write a custom reporting command that finds the top words. It seems to work, but I see some data isn't transferred to reducer from mapper. For example, I process 10 events and produced 100 words on each mapper invocation, the reducer should get 100 x mapper times words to process, but it doesn't happen. Some of the words yielded by the mapper cannot be accessed by the reducer.

My mapper and reducer implementation is below.

@Configuration()
def map(self, records):
    self.logger.debug('TopWordsCommand.map')
    fieldname = self.field
    total = {}
    cnt = 0
    word_cnt = 0

    for record in records:
        text = record[fieldname]
        for word in text.split():
            if word in total:
                total[word] = int(total[word]) + 1
            else:
                total[word] = 1
            word_cnt += 1
        cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info('Finished map. Processed {} events and {} words.'.format(cnt, word_cnt))

def reduce(self, records):
    self.logger.debug('TopWordsCommand.reduce')
    total = {}
    word_cnt = 0
    uniq_word_cnt = 0

    for record in records:
        word = record['word']
        count = record['count']
        word_cnt += 1

        if word in total:
            total[word] += int(count)
        else:
            total[word] = int(count)
            uniq_word_cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info("Finished reduce. Total number of words {}, unique words {}".format(word_cnt, uniq_word_cnt))
0 Karma

DeronJensen
Explorer

I don't know if this is the issue but line 16:

         word_cnt += count

I think there are 2 different variables. 'cnt' and 'count'. At line 16 'count' is not defined.

0 Karma

dcagatay
Explorer

No that is not the issue. I didn't post the actual code, but it is very similar to the original to give the gist. The actual code doesn't give that kind of errors.

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...