Hey All,
Been banging my head for a few days with this one and will appreciate any feedback on the topic.
The scenario is the following:
- LDAP data is polled via ldapsearch and written to a csv lookup table - ldap_users
- The use case is to return all subordinates under a specific person in the company
Here's an example of the end goal:
Consider the following 5 lines in the csv lookup
cn displayName mail dn manager
dm123 Dean, Martin dean.martin@test.eu CN=dm123,OU=T,OU=E,OU=S -
ep123 Elvis, Presley elvis.presley@test.eu CN=ep123,OU=T,OU=E,OU=S dm123
mj123 Michael, Jordan michael.jordan@test.eu CN=mj123,OU=T,OU=E,OU=S ep123
bc123 Bill, Clinton bill.clinton@test.eu CN=bc123,OU=T,OU=E,OU=S mj123
ba123 Buzz, Aldrin buzz.aldrin@test.eu CN=ba123,OU=T,OU=E,OU=S mj123
In this case, dm123 is the CEO of the company and is the direct manager of ep123. ep123 manages 1 person mj123, who then manages two people - bc123 and ba123.
What is needed - if I run a query for the user dm123 to receive everybody under them in the company structure, so in this case, all other 4 users.
So far I'm swinging at using foreach to iterate over the ldap tree, but I can't figure out how to do multiple iterations.
Here's my search so far:
| inputlookup ldap_users WHERE ( ( manager="CN=dm123,OU=T,OU=E,OU=S" ) )
| search cn!="x*" cn!="y*" cn!="z*"
| table cn, displayName, mail, dn
| rex field=dn "CN\=(?P<dn>[^\,]+)\,"
| foreach dn
[| lookup ldap_users manager as dn OUTPUTNEW mail as mail_employee]
| mvexpand mail_employee
| search mail_employee!="-"
In manager= in the above, you would enter the person from who the iteration should begin. In this case, the CEO. The output returned is the cn, displayName, mail, dn for all their direct subordinates - ep123.
I then take the dn for them (ep123) and do a foreach, looking for any users who have a manager who's equal to the dn - this would be mj123. This provides me with the following output:
cn displayName mail dn mail_employee
ep123 Elvis, Presley elvis.presley@test.eu CN=ep123,OU=T,OU=E,OU=S michael.jordan@test.eu
This is good, but now I have two roadblocks:
- How to use mj123 (any piece of information-mail/dn/cn, doesn't matter which we chose for the example) to iterate once more and return bc123 and ba123.
Any help will be extremely appreciated!
Hi @D3mby,
Based on @yuanliu solution, I followed another way. Below works with sample data. Could you please give it a try? You should adapt lookup lines and temp m_* variables according your count of management levels.
| inputlookup ldap_users.csv
| eventstats values(cn) as underlings by manager
| rename manager as m_3
| lookup ldap_users.csv cn as m_3 OUTPUT manager as m_2
| lookup ldap_users.csv cn as m_2 OUTPUT manager as m_1
| lookup ldap_users.csv cn as m_1 OUTPUT manager as m_0
| foreach m_*
[ eval mng='<<FIELD>>'
| eval managers=mvappend(managers,mng) ]
| fields - m_* - mng - underlings
| stats values(*) as * by managers
| where managers!="-"
A few months ago, there was another question about org chart, but the user quickly deleted the question. Although the data format is different, you can use the same technique.
First, let's construct the org chart. Assume this Dean Martin empire has no more than six levels of management.
| inputlookup ldap_users
| eval flatdata = cn . ":" . if(manager == "-", "", manager)
| eventstats values(flatdata) AS flatdata
| foreach 1 2 3 4 5 6 ``` assuming no more than 6 levels ```
[| eval manager = mvappend(mvindex(split(mvindex(flatdata, mvfind(flatdata, "^" . mvindex(manager, 0) . ":")), ":"), 1), manager)]
What this gives you is a manager field that contains all upperlings all the way up to Dean Martin.
cn | displayName | dn | flatdata | manager | |
dm123 | Dean, Martin | CN=dm123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | dean.martin@test.eu | - |
ep123 | Elvis, Presley | CN=ep123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | elvis.presley@test.eu | dm123 |
mj123 | Michael, Jordan | CN=mj123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | michael.jordan@test.eu | dm123 ep123 |
bc123 | Bill, Clinton | CN=bc123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | bill.clinton@test.eu | dm123 ep123 mj123 |
ba123 | Buzz, Aldrin | CN=ba123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | buzz.aldrin@test.eu | dm123 ep123 mj123 |
With this table, you can easily search up Elvis and all his underlings, e.g., using search command.
| search cn = ep123 OR manager = ep123 ``` gimme Elvis and his underlings ```
cn | displayName | dn | flatdata | manager | |
ep123 | Elvis, Presley | CN=ep123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | elvis.presley@test.eu | dm123 |
mj123 | Michael, Jordan | CN=mj123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | michael.jordan@test.eu | dm123 ep123 |
bc123 | Bill, Clinton | CN=bc123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | bill.clinton@test.eu | dm123 ep123 mj123 |
ba123 | Buzz, Aldrin | CN=ba123,OU=T,OU=E,OU=S | ba123:mj123 bc123:mj123 dm123: ep123:dm123 mj123:ep123 | buzz.aldrin@test.eu | dm123 ep123 mj123 |
If your goal is to lookup Elvis specifically, you can construct any display you like from the above.
If your goal is to lookup underlings of all managers, you first sum them up using stats:
| stats values(cn) as underlings by manager
| where manager != "-"
manager | underlings |
dm123 | ba123 bc123 ep123 mj123 |
ep123 | ba123 bc123 mj123 |
mj123 | ba123 bc123 |
Then apply your original lookup
| lookup ldap_users cn as underlings
Put all this together,
| inputlookup ldap_users
| eval flatdata = if(manager == "-", null(), cn . ":" . manager)
| eventstats values(flatdata) AS flatdata
| foreach 1 2 3 4 5 6
[| eval manager = mvappend(mvindex(split(mvindex(flatdata, mvfind(flatdata, "^" . mvindex(manager, 0) . ":")), ":"), 1), manager)]
| stats values(cn) as underlings by manager
| where manager != "-"
| lookup ldap_users cn as underlings
The following is data emulation that you can play with and compare with your real data.
| makeresults
| fields - _time
| eval _raw = "
cn displayName mail dn manager
dm123 Dean, Martin dean.martin@test.eu CN=dm123,OU=T,OU=E,OU=S -
ep123 Elvis, Presley elvis.presley@test.eu CN=ep123,OU=T,OU=E,OU=S dm123
mj123 Michael, Jordan michael.jordan@test.eu CN=mj123,OU=T,OU=E,OU=S ep123
bc123 Bill, Clinton bill.clinton@test.eu CN=bc123,OU=T,OU=E,OU=S mj123
ba123 Buzz, Aldrin buzz.aldrin@test.eu CN=ba123,OU=T,OU=E,OU=S mj123"
| multikv forceheader=1
| fields - _raw linecount
``` data emulation above for
| inputlookup ldap_users ```
Hey!
Thanks so much for this, it's really close to what I'm after 😁
The example works great with small datasets, but when you create the "flatdata" field, you basically output every combination of user+manager from all available events that you are fetching from the dataset - in my case, the csv.
As you can see, I'm limiting the search to retrieve 10 results, meaning that each event's flatdata field will have 10 values(mv) in it, as it aggregates all user+manager combination from the other events as well.
This is also true when we use the dummy dataset:
Now the csv that holds the user information is somewhere around 96K rows.
The way I'm interpreting this, it will require each of these 96k rows to hold 96k entries about a user + manager combination in the flatdata field - no need to say, the search doesn't complete at all when I attempt this.
Yes, I have to admit that I have never run inputlookup on 96K lines. Saving a 96K-value field with each row is probably not the best use of RAM. However, since all Splunk gets is a file and not a database, the data will have to be held somewhere. So, I don't see an easy way out.
Some thoughts.
Speaking of limit inputs, you can restrict output of inputlookup to cn and manager because all other fields are irrelevant to this task. But compared with the N^2 factor, this will not help the big problem.
My thoughts exactly, but your suggestion has helped me plenty.
Will experiment a bit further and post any positive results that I get.
Splunk essentially processes each event one at a time. Some commands work on the multiple events from the pipeline before producing an output event, but still each command processes the pipeline once.
The foreach command operates on each event one at a time for each field mentioned in the field list.
In order to do what you want, you would have to process the event pipeline multiple times. How many times you depend on your data. There is no way to tell ahead of time how many iterations would be required.
One way of solving this might be to write a custom command that essentially takes all the events and processes them as a set. Good luck with that. 😀
Thanks, thought there might be a quick way around this that I'm missing