Re: Help needed with foreach query for LDAP user i...

D3mby · ‎03-28-2023

Hey All,

Been banging my head for a few days with this one and will appreciate any feedback on the topic.
The scenario is the following:
- LDAP data is polled via ldapsearch and written to a csv lookup table - ldap_users
- The use case is to return all subordinates under a specific person in the company

Here's an example of the end goal:

Consider the following 5 lines in the csv lookup

cn displayName mail dn manager
dm123 Dean, Martin [email protected] CN=dm123,OU=T,OU=E,OU=S -
ep123 Elvis, Presley [email protected] CN=ep123,OU=T,OU=E,OU=S dm123
mj123 Michael, Jordan [email protected] CN=mj123,OU=T,OU=E,OU=S ep123
bc123 Bill, Clinton [email protected] CN=bc123,OU=T,OU=E,OU=S mj123
ba123 Buzz, Aldrin [email protected] CN=ba123,OU=T,OU=E,OU=S mj123

In this case, dm123 is the CEO of the company and is the direct manager of ep123. ep123 manages 1 person mj123, who then manages two people - bc123 and ba123.

What is needed - if I run a query for the user dm123 to receive everybody under them in the company structure, so in this case, all other 4 users.

So far I'm swinging at using foreach to iterate over the ldap tree, but I can't figure out how to do multiple iterations.
Here's my search so far:

In manager= in the above, you would enter the person from who the iteration should begin. In this case, the CEO. The output returned is the cn, displayName, mail, dn for all their direct subordinates - ep123.
I then take the dn for them (ep123) and do a foreach, looking for any users who have a manager who's equal to the dn - this would be mj123. This provides me with the following output:

cn displayName mail dn mail_employee
ep123 Elvis, Presley [email protected] CN=ep123,OU=T,OU=E,OU=S [email protected]

This is good, but now I have two roadblocks:

- How to use mj123 (any piece of information-mail/dn/cn, doesn't matter which we chose for the example) to iterate once more and return bc123 and ba123.

Any help will be extremely appreciated!

scelikok · ‎03-29-2023

Hi @D3mby,

Based on @yuanliu solution, I followed another way. Below works with sample data. Could you please give it a try? You should adapt lookup lines and temp m_* variables according your count of management levels.

| inputlookup ldap_users.csv 
| eventstats values(cn) as underlings by manager 
| rename manager as m_3 
| lookup ldap_users.csv cn as m_3 OUTPUT manager as m_2 
| lookup ldap_users.csv cn as m_2 OUTPUT manager as m_1 
| lookup ldap_users.csv cn as m_1 OUTPUT manager as m_0 
| foreach m_* 
    [ eval mng='<<FIELD>>' 
    | eval managers=mvappend(managers,mng) ] 
| fields - m_* - mng - underlings 
| stats values(*) as * by managers 
| where managers!="-"

If this reply helps you an upvote and "Accept as Solution" is appreciated.

yuanliu · ‎03-28-2023

A few months ago, there was another question about org chart, but the user quickly deleted the question. Although the data format is different, you can use the same technique.

First, let's construct the org chart. Assume this Dean Martin empire has no more than six levels of management.

| inputlookup ldap_users
| eval flatdata = cn . ":" . if(manager == "-", "", manager)
| eventstats values(flatdata) AS flatdata
| foreach 1 2 3 4 5 6 ``` assuming no more than 6 levels ```
    [| eval manager = mvappend(mvindex(split(mvindex(flatdata, mvfind(flatdata, "^" . mvindex(manager, 0) . ":")), ":"), 1), manager)]

What this gives you is a manager field that contains all upperlings all the way up to Dean Martin.

cn	displayName	dn	flatdata	mail	manager
dm123	Dean, Martin	CN=dm123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	-
ep123	Elvis, Presley	CN=ep123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123
mj123	Michael, Jordan	CN=mj123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123 ep123
bc123	Bill, Clinton	CN=bc123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123 ep123 mj123
ba123	Buzz, Aldrin	CN=ba123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123 ep123 mj123

With this table, you can easily search up Elvis and all his underlings, e.g., using search command.

| search cn = ep123 OR manager = ep123 ``` gimme Elvis and his underlings ```

cn	displayName	dn	flatdata	mail	manager
ep123	Elvis, Presley	CN=ep123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123
mj123	Michael, Jordan	CN=mj123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123 ep123
bc123	Bill, Clinton	CN=bc123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123 ep123 mj123
ba123	Buzz, Aldrin	CN=ba123,OU=T,OU=E,OU=S	ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123	[email protected]	dm123 ep123 mj123

If your goal is to lookup Elvis specifically, you can construct any display you like from the above.

If your goal is to lookup underlings of all managers, you first sum them up using stats:

| stats values(cn) as underlings by manager
| where manager != "-"

manager	underlings
dm123	ba123 bc123 ep123 mj123
ep123	ba123 bc123 mj123
mj123	ba123 bc123

Then apply your original lookup

| lookup ldap_users cn as underlings

Put all this together,

| inputlookup ldap_users
| eval flatdata = if(manager == "-", null(), cn . ":" . manager)
| eventstats values(flatdata) AS flatdata
| foreach 1 2 3 4 5 6
    [| eval manager = mvappend(mvindex(split(mvindex(flatdata, mvfind(flatdata, "^" . mvindex(manager, 0) . ":")), ":"), 1), manager)]
| stats values(cn) as underlings by manager
| where manager != "-"
| lookup ldap_users cn as underlings

The following is data emulation that you can play with and compare with your real data.

| makeresults
| fields - _time
| eval _raw = "
cn      displayName     mail                    dn                        manager                                 
dm123   Dean, Martin    [email protected]     CN=dm123,OU=T,OU=E,OU=S   -
ep123   Elvis, Presley  [email protected]   CN=ep123,OU=T,OU=E,OU=S   dm123
mj123   Michael, Jordan [email protected]  CN=mj123,OU=T,OU=E,OU=S   ep123
bc123   Bill, Clinton   [email protected]    CN=bc123,OU=T,OU=E,OU=S   mj123
ba123   Buzz, Aldrin    [email protected]     CN=ba123,OU=T,OU=E,OU=S   mj123"
| multikv forceheader=1
| fields - _raw linecount
``` data emulation above for
| inputlookup ldap_users ```

D3mby · ‎03-28-2023

Hey!
Thanks so much for this, it's really close to what I'm after 😁

The example works great with small datasets, but when you create the "flatdata" field, you basically output every combination of user+manager from all available events that you are fetching from the dataset - in my case, the csv.

As you can see, I'm limiting the search to retrieve 10 results, meaning that each event's flatdata field will have 10 values(mv) in it, as it aggregates all user+manager combination from the other events as well.

This is also true when we use the dummy dataset:

Now the csv that holds the user information is somewhere around 96K rows.
The way I'm interpreting this, it will require each of these 96k rows to hold 96k entries about a user + manager combination in the flatdata field - no need to say, the search doesn't complete at all when I attempt this.

yuanliu · ‎03-29-2023

Yes, I have to admit that I have never run inputlookup on 96K lines. Saving a 96K-value field with each row is probably not the best use of RAM. However, since all Splunk gets is a file and not a database, the data will have to be held somewhere. So, I don't see an easy way out.

Some thoughts.

Run an external query to LDAP itself and produce an underling array on-demand as @ITWhisperer suggested. It will carry performance cost, of course.
Examine the use case more closely and make some choices. See if it allows you to limit input. For example, if there is some department code in LDAP and the org chart follows these codes. Or if there is a employee level code and the level is automatically enforced with some known algorithm.
Run a query and store underlings chart in a CSV or KV store periodically.
If anything, examine limits.conf to lift some memory and timeout limits. I do not quite know which knobs will help, however. 96K^2 = 9.2 billion. That is a lot.

Speaking of limit inputs, you can restrict output of inputlookup to cn and manager because all other fields are irrelevant to this task. But compared with the N^2 factor, this will not help the big problem.

D3mby · ‎03-29-2023

My thoughts exactly, but your suggestion has helped me plenty.

Will experiment a bit further and post any positive results that I get.

ITWhisperer · ‎03-28-2023

Splunk essentially processes each event one at a time. Some commands work on the multiple events from the pipeline before producing an output event, but still each command processes the pipeline once.

The foreach command operates on each event one at a time for each field mentioned in the field list.

In order to do what you want, you would have to process the event pipeline multiple times. How many times you depend on your data. There is no way to tell ahead of time how many iterations would be required.

One way of solving this might be to write a custom command that essentially takes all the events and processes them as a set. Good luck with that. 😀

D3mby · ‎03-28-2023

Thanks, thought there might be a quick way around this that I'm missing

Help needed with foreach query for LDAP user information

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Federated Search for Snowflake Is Now Generally Available on Splunk Cloud Platform

Help Us Build Better Splunk Regex Puzzles (And Win Prizes!)

Fuel Your Journey: What’s Waiting for You at the .conf26 Acceleration Station

Join the Conversation