Splunk Search

Why doesn't the reducer get all events from the mapper function in my custom reporting command?

dcagatay
Explorer

I am trying to write a custom reporting command that finds the top words. It seems to work, but I see some data isn't transferred to reducer from mapper. For example, I process 10 events and produced 100 words on each mapper invocation, the reducer should get 100 x mapper times words to process, but it doesn't happen. Some of the words yielded by the mapper cannot be accessed by the reducer.

My mapper and reducer implementation is below.

@Configuration()
def map(self, records):
    self.logger.debug('TopWordsCommand.map')
    fieldname = self.field
    total = {}
    cnt = 0
    word_cnt = 0

    for record in records:
        text = record[fieldname]
        for word in text.split():
            if word in total:
                total[word] = int(total[word]) + 1
            else:
                total[word] = 1
            word_cnt += 1
        cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info('Finished map. Processed {} events and {} words.'.format(cnt, word_cnt))

def reduce(self, records):
    self.logger.debug('TopWordsCommand.reduce')
    total = {}
    word_cnt = 0
    uniq_word_cnt = 0

    for record in records:
        word = record['word']
        count = record['count']
        word_cnt += 1

        if word in total:
            total[word] += int(count)
        else:
            total[word] = int(count)
            uniq_word_cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info("Finished reduce. Total number of words {}, unique words {}".format(word_cnt, uniq_word_cnt))
0 Karma

DeronJensen
Explorer

I don't know if this is the issue but line 16:

         word_cnt += count

I think there are 2 different variables. 'cnt' and 'count'. At line 16 'count' is not defined.

0 Karma

dcagatay
Explorer

No that is not the issue. I didn't post the actual code, but it is very similar to the original to give the gist. The actual code doesn't give that kind of errors.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...