Splunk Search

Why doesn't the reducer get all events from the mapper function in my custom reporting command?

dcagatay
Explorer

I am trying to write a custom reporting command that finds the top words. It seems to work, but I see some data isn't transferred to reducer from mapper. For example, I process 10 events and produced 100 words on each mapper invocation, the reducer should get 100 x mapper times words to process, but it doesn't happen. Some of the words yielded by the mapper cannot be accessed by the reducer.

My mapper and reducer implementation is below.

@Configuration()
def map(self, records):
    self.logger.debug('TopWordsCommand.map')
    fieldname = self.field
    total = {}
    cnt = 0
    word_cnt = 0

    for record in records:
        text = record[fieldname]
        for word in text.split():
            if word in total:
                total[word] = int(total[word]) + 1
            else:
                total[word] = 1
            word_cnt += 1
        cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info('Finished map. Processed {} events and {} words.'.format(cnt, word_cnt))

def reduce(self, records):
    self.logger.debug('TopWordsCommand.reduce')
    total = {}
    word_cnt = 0
    uniq_word_cnt = 0

    for record in records:
        word = record['word']
        count = record['count']
        word_cnt += 1

        if word in total:
            total[word] += int(count)
        else:
            total[word] = int(count)
            uniq_word_cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info("Finished reduce. Total number of words {}, unique words {}".format(word_cnt, uniq_word_cnt))
0 Karma

DeronJensen
Explorer

I don't know if this is the issue but line 16:

         word_cnt += count

I think there are 2 different variables. 'cnt' and 'count'. At line 16 'count' is not defined.

0 Karma

dcagatay
Explorer

No that is not the issue. I didn't post the actual code, but it is very similar to the original to give the gist. The actual code doesn't give that kind of errors.

0 Karma
Get Updates on the Splunk Community!

What’s New & Next in Splunk SOAR

Security teams today are dealing with more alerts, more tools, and more pressure than ever.  Join us on ...

Your Voice Matters! Help Us Shape the New Splunk Lantern Experience

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

September Community Champions: A Shoutout to Our Contributors!

As we close the books on another fantastic month, we want to take a moment to celebrate the people who are the ...