Splunk Search

How to join two sources with common one common field ?

zacksoft_wf
Contributor

I have sourcetype A that has info about service_accounts such as name, AU, email , full_name, manager_name.
But some of the events in source A, do not contain the field  email , manager_name, full_name field. In those cases I have to look into another index and sourcetype, say B to fetch those data. AU is the common field name in both . Can we join the data, without having to use 'join' for performance issue ?

 

Labels (2)
0 Karma

johnhuang
Motivator

Assuming  b_name, b_email, b_manager_name, b_full_name are the fieldnames for index "B".

(index=A OR index=B)
| table AU, name, email, full_name, manager_name, b_name, b_email, b_manager_name, b_full_name
| stats MAX(*) AS * BY AU
| eval name=COALESCE(name, b_name)
| eval email=COALESCE(email, b_email)
| eval full_name=COALESCE(full_name, b_full_name)
| eval manager_name=COALESCE(manager_name, b_manager_name)
| table AU, name, email, full_name, manager_name

0 Karma

zacksoft_wf
Contributor

@johnhuang  - I tried with Stats(value) and sharing you the screenshot. Some of them are multivalue fields, I believe thats why the remaining fields are not populating . Stats(max) removes the multivalue with one value , but then again , rest of the fields are not populating, and also I would want each multivalue field to become separate rows.  Please find the attachment and suggest.

0 Karma

Sukisen1981
Champion

the field 'elid' which is coming empty, is that an index field? Also can you check if elid field is missing for all rows or only for some rows

0 Karma

zacksoft_wf
Contributor

@Sukisen1981  Not an indexed field. Just a regular field. And  as i check now elid and others are not empty for all rows.   

Tags (1)
0 Karma

Sukisen1981
Champion

@zacksoft_wf  Sorry, I meant to ask the field elid, it is present in one of the indexes, right? All we have done in the query is make a table of the needed fields followed by a stats values(*). But the bigger question is , and I was suspecting the missing elid is not an issue with the stats. Is it possible for you to check(say for the first row in your screen shot where AU=0), if elid is indeed present in either indexes for AU=0?

PS- I think we are near to the actual solution 🙂 its just that we need to have a better understanding of the fields in your index

0 Karma

zacksoft_wf
Contributor

  I tried @johnhuang 's solution with a change , it populates but not quite the way i wanted.

=========

(`index=A) OR (index=B)
| eval pwd_expires=if(nopassexpire==1, "True", "False"), account_type=if(type=="S", "Service Account", account_type)
| table is_interactive,account_name,cn,au,acct_name,elid,full_name,full_name,email_address,manager_name,service_account_name,job_title,lob,pwd_expires,service_accout_name,account_type,service_account_id,service_account_id,owner_elid,au_owner_name,au_owner_email
| eventstats MAX(*) AS * BY au
| eval elid=coalesce(elid,owner_elid)
| eval au_owner_email=coalesce(au_owner_email,email_address)
| eval au_owner_name=coalesce(au_owner_name,full_name)
| eval service_accout_name=coalesce(service_account_name,cn)
| eval service_account_id=coalesce(service_account_id,app_id)
| rename acct_name as user, account_type as type| eval user=lower(user)

| table user type pwd_expires is_interactive service_account_id service_account_name au au_owner_name job_title au_owner_email elid manager_name lob

=========

I changed stats to eventstats , it populates value, but i think because of the usage of MAX() function I am not seeing multiple service_account_id or service_acount_names asociated to each au. I am just seeing one entry. But in real the multivalue fields should be split into different rows , but now MAX() messes things up,  and stats values() is creating some multivalue fileds !!!!     stuck !!

0 Karma

johnhuang
Motivator

Instead of MAX, you can use VALUES to list out the multivalue fields.

 

VALUES(*) AS *

0 Karma

zacksoft_wf
Contributor

for every 'au', there are multiple service_account_name, service_account_id, user. hence a multivalue field creation while using eventstats using max() function is just giving me one value of service_account_name, service_account_id, user per each 'au' . And that's incorrect. If I could see all the multivalue fields split , like multiple au rows for each combination of its associated service_account_name etc field,that would be helpful. 'cos in the end I intent to write this result into a kvstore lookup.

0 Karma

johnhuang
Motivator

Help me understand this better.

Could you list all the inscope fields of Index A and Index B seperately and indicate which index is being used for the enrichment? What does AU stand for?

 

0 Karma

zacksoft_wf
Contributor

Index A fields : account_name,cn,au,acct_name,elid,full_name,email_address,manager_name
Index B fields : au,service_account_name,job_title,lob,account_type,service_account_id,owner_elid,au_owner_name,au_owner_email

Example : Lets say, if "account_name" has no value from index A , then coalesce should populate it from "au_owner_name". (Thats what I am coalescing for a couple of fields)

Also, stats values(*)  by au, doesn't yield result(by combining from both index)  but when  eventstats does.
However eventstats values(*) creates multivalue fields amd mvexpand won't help. using eventstats max(*)
is helping, but max() function is fetching me just one value out of the mvfield.
I tried mvexpand to flatten some fields, but it didn't flatten.  

Here is the code,

===========

(index=A) OR (index=B)

| table is_interactive,account_name,cn,au,acct_name,elid,full_name,email_address,manager_name,service_account_name,job_title,lob,pwd_expires,account_type,service_account_id,owner_elid,au_owner_name,au_owner_email

| eventstats values(*) AS * BY au

| mvexpand user
| mvexpand service_account_name
| mvexpand service_account_id


| eval elid=coalesce(elid,owner_elid)
| eval au_owner_email=coalesce(au_owner_email,email_address)
| eval au_owner_name=coalesce(au_owner_name,full_name)
| eval service_accout_name=coalesce(service_account_name,cn,account_name)
| eval service_account_id=coalesce(service_account_id,app_id)
| rename acct_name as user, account_type as type, lob as bunit
| eval user=lower(user)
| dedup user
| rex field=user "[^:]+:(?<user>[^\s]+)"
| eval _key=user+".key"
| fields - owner_elid service_account_name service_account_id cn account_name

===========

0 Karma

johnhuang
Motivator

Ok, the way I understand it is that you are using fields from B to enrich A and lacking common field with unique id/key to combine the records.

You should try to identify some fields and values that is common to both A and B for grouping. For example, if the both datasets have an "account name" but in different columns, you can normalize it into a new field. 

| eval common_account_name=LOWER(COALESCE(service_account_name,cn,account_name))

 

(`index=A) OR (index=B)
| table is_interactive account_name cn au acct_name elid full_name email_address manager_name service_account_name job_title lob pwd_expires account_type service_account_id owner_elid au_owner_name au_owner_email
| eval common_account_name=LOWER(COALESCE(service_account_name,cn,account_name))
| eventstats MAX(job_title) AS job_title MAX(lob) AS lob MAX(account_type) AS account_type MAX(owner_elid) AS owner_elid au_owner_name MAX(au_owner_name) AS au_owner_name MAX(au_owner_email) AS au_owner_email BY au common_account_name
| search account_name=*

 

 

0 Karma

johnhuang
Motivator

Yes that's easy to do. Instead of MAX, use VALUES(*) AS *.

Then run this command for each mutivalued field that you want to expand/flatten. 

| mvexpand field_name_1
| mvexpand field_name_2

 

0 Karma

Sukisen1981
Champion

@zacksoft_wf  eventstats wont give you the correct answer, eventstats over a field will return a function(like max or min) of all the events. Coming back to my question , were you able to see why some field values for eild was missing and some came in the stats values(*) query? If we can fix that our job is almost done, for then we just need to use the mvzip and(or) mvexpand functions to get the desired output.

Can you investigate a bit more on the empty/non empty values for eild?

Tags (1)
0 Karma

zacksoft_wf
Contributor

@Sukisen1981  "elid" is present for some events and absent for some.

0 Karma

Sukisen1981
Champion

I got that, what i am asking is can you see any pattern or condition under which elid is missing and elid is coming? I mean to say, is it possible to see the raw events which are causing empty elids and events which are returning elid multivalues? Maybe something to do with the way events are being logged in the indexes

0 Karma

zacksoft_wf
Contributor

@Sukisen1981  Actually , elid is not a multivalue field. user, service_account_name, service_account_id are.   
elid is present is some events (not all), and that is okay for us. There is no pattern .

0 Karma

Sukisen1981
Champion

Are you sure you want to use a join? what you need is something like this - 

(index=bayseian OR index =alt)|stats values (field1, field2) by AU

Here is what I want you to do, limit your query first to 2 fields (field1 and field2 are just placeholders) and assuming AU is your common field , retrofit the above the search and run it. See the output of the above query, I am sure you will get what I am hinting at. If you manage to get all the field values  from the above query,the rest of the evals, regexes can be applied later, try it out and let us know

0 Karma

zacksoft_wf
Contributor

In my case , I have to search for fields in index bayseian, if the field/fields is not found (means no value), I have to check the other index, where that field might be there but with a different field name. And there is one common field between these two datasources  to correlate .


0 Karma

Sukisen1981
Champion

if that's ok then going back to your original screen shot if we can get one row for each multivalue field, does that solve your requirement?

0 Karma

zacksoft_wf
Contributor

@Sukisen1981  yes

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...