All Apps and Add-ons

Why does search work when in streaming mode, but fails when the job switches to map/reduce?

lob5ter
Engager

Greetings,

I hope you can help with the following.
This query works when in streaming mode, but fails when the job switches to map/reduce:

index=hdpidx event.original="*\"EventId\":\"4776\"*"
| rex field=event.original "TargetUserName\":\"(?.+?)\""
| ldapfilter search="(sAMAccountNAme=$foo$)" attrs="name"
| table foo, name

It will begin returning results, but once it switches to map/red we see the following error repeated for multiple hadoop worker nodes:

[hdpprov] [w001.cluster] External search command 'ldapfilter' returned error code 1. Script output = "error_message=TypeError at "/tmp/splunk/splunk.hdpidx/splunk/var/run/searchpeers/splunk.hdpidx-1585082701/apps/SA-ldapsearch/bin/packages/splunklib/binding.py", line 478 : int() argument must be a string or a number, not 'NoneType' ". 

[hdpprov] [w003.cluster] External search command 'ldapfilter' returned error code 1. Script output = "error_message=TypeError at "/tmp/splunk/splunk.hdpidx/splunk/var/run/searchpeers/splunk.hdpidx-1585082701/apps/SA-ldapsearch/bin/packages/splunklib/binding.py", line 478 : int() argument must be a string or a number, not 'NoneType' ". 

... error repeated for each of the worker nodes participating in the job ...

I do not believe this is an issue with ldapfilter, but with an interaction between $var$ expansion and hunk.

If I add a head 10 directly after the initial search (before the rex), the search runs successfully - IMHO pointing to the issue only occurring during map-reduce.

I have verified worker node network access the AD servers.

Could this be an escaping issue where $foo$ is being replaced with a (non-existent) environment variable? Hence the NoneType?

Any thoughts would be much appreciated!

0 Karma

lob5ter
Engager

For anyone with the same problem - I've solved this a different way. We're taking a full copy of our AD entries using ldapsearch on a daily basis and using the data in a lookup instead. Added benefit is that we're not querying AD per event, downside it's a big lookup and it's ~a day old.

0 Karma
Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...