Re: How to join two sources with common one common...

zacksoft_wf · ‎12-01-2021

I have sourcetype A that has info about service_accounts such as name, AU, email , full_name, manager_name.
But some of the events in source A, do not contain the field email , manager_name, full_name field. In those cases I have to look into another index and sourcetype, say B to fetch those data. AU is the common field name in both . Can we join the data, without having to use 'join' for performance issue ?

johnhuang · ‎12-01-2021

Assuming b_name, b_email, b_manager_name, b_full_name are the fieldnames for index "B".

(index=A OR index=B)
| table AU, name, email, full_name, manager_name, b_name, b_email, b_manager_name, b_full_name
| stats MAX(*) AS * BY AU
| eval name=COALESCE(name, b_name)
| eval email=COALESCE(email, b_email)
| eval full_name=COALESCE(full_name, b_full_name)
| eval manager_name=COALESCE(manager_name, b_manager_name)
| table AU, name, email, full_name, manager_name

zacksoft_wf · ‎12-02-2021

@johnhuang - I tried with Stats(value) and sharing you the screenshot. Some of them are multivalue fields, I believe thats why the remaining fields are not populating . Stats(max) removes the multivalue with one value , but then again , rest of the fields are not populating, and also I would want each multivalue field to become separate rows. Please find the attachment and suggest.

Sukisen1981 · ‎12-02-2021

the field 'elid' which is coming empty, is that an index field? Also can you check if elid field is missing for all rows or only for some rows

zacksoft_wf · ‎12-02-2021

@Sukisen1981 Not an indexed field. Just a regular field. And as i check now elid and others are not empty for all rows.

Sukisen1981 · ‎12-02-2021

@zacksoft_wf Sorry, I meant to ask the field elid, it is present in one of the indexes, right? All we have done in the query is make a table of the needed fields followed by a stats values(*). But the bigger question is , and I was suspecting the missing elid is not an issue with the stats. Is it possible for you to check(say for the first row in your screen shot where AU=0), if elid is indeed present in either indexes for AU=0?

PS- I think we are near to the actual solution 🙂 its just that we need to have a better understanding of the fields in your index

zacksoft_wf · ‎12-02-2021

I tried @johnhuang 's solution with a change , it populates but not quite the way i wanted.

=========

(`index=A) OR (index=B)
| eval pwd_expires=if(nopassexpire==1, "True", "False"), account_type=if(type=="S", "Service Account", account_type)
| table is_interactive,account_name,cn,au,acct_name,elid,full_name,full_name,email_address,manager_name,service_account_name,job_title,lob,pwd_expires,service_accout_name,account_type,service_account_id,service_account_id,owner_elid,au_owner_name,au_owner_email
| eventstats MAX(*) AS * BY au
| eval elid=coalesce(elid,owner_elid)
| eval au_owner_email=coalesce(au_owner_email,email_address)
| eval au_owner_name=coalesce(au_owner_name,full_name)
| eval service_accout_name=coalesce(service_account_name,cn)
| eval service_account_id=coalesce(service_account_id,app_id)
| rename acct_name as user, account_type as type| eval user=lower(user)

| table user type pwd_expires is_interactive service_account_id service_account_name au au_owner_name job_title au_owner_email elid manager_name lob

=========

I changed stats to eventstats , it populates value, but i think because of the usage of MAX() function I am not seeing multiple service_account_id or service_acount_names asociated to each au. I am just seeing one entry. But in real the multivalue fields should be split into different rows , but now MAX() messes things up, and stats values() is creating some multivalue fileds !!!! stuck !!

johnhuang · ‎12-02-2021

Instead of MAX, you can use VALUES to list out the multivalue fields.

VALUES(*) AS *

zacksoft_wf · ‎12-02-2021

for every 'au', there are multiple service_account_name, service_account_id, user. hence a multivalue field creation while using eventstats using max() function is just giving me one value of service_account_name, service_account_id, user per each 'au' . And that's incorrect. If I could see all the multivalue fields split , like multiple au rows for each combination of its associated service_account_name etc field,that would be helpful. 'cos in the end I intent to write this result into a kvstore lookup.

johnhuang · ‎12-02-2021

Help me understand this better.

Could you list all the inscope fields of Index A and Index B seperately and indicate which index is being used for the enrichment? What does AU stand for?

zacksoft_wf · ‎12-03-2021

Index A fields : account_name,cn,au,acct_name,elid,full_name,email_address,manager_name
Index B fields : au,service_account_name,job_title,lob,account_type,service_account_id,owner_elid,au_owner_name,au_owner_email

Example : Lets say, if "account_name" has no value from index A , then coalesce should populate it from "au_owner_name". (Thats what I am coalescing for a couple of fields)

Also, stats values(*) by au, doesn't yield result(by combining from both index) but when eventstats does.
However eventstats values(*) creates multivalue fields amd mvexpand won't help. using eventstats max(*)
is helping, but max() function is fetching me just one value out of the mvfield.
I tried mvexpand to flatten some fields, but it didn't flatten.

Here is the code,

===========

(index=A) OR (index=B)

| table is_interactive,account_name,cn,au,acct_name,elid,full_name,email_address,manager_name,service_account_name,job_title,lob,pwd_expires,account_type,service_account_id,owner_elid,au_owner_name,au_owner_email

| eventstats values(*) AS * BY au

| mvexpand user
| mvexpand service_account_name
| mvexpand service_account_id

| eval elid=coalesce(elid,owner_elid)
| eval au_owner_email=coalesce(au_owner_email,email_address)
| eval au_owner_name=coalesce(au_owner_name,full_name)
| eval service_accout_name=coalesce(service_account_name,cn,account_name)
| eval service_account_id=coalesce(service_account_id,app_id)
| rename acct_name as user, account_type as type, lob as bunit
| eval user=lower(user)
| dedup user
| rex field=user "[^:]+:(?<user>[^\s]+)"
| eval _key=user+".key"
| fields - owner_elid service_account_name service_account_id cn account_name

===========

johnhuang · ‎12-03-2021

Ok, the way I understand it is that you are using fields from B to enrich A and lacking common field with unique id/key to combine the records.

You should try to identify some fields and values that is common to both A and B for grouping. For example, if the both datasets have an "account name" but in different columns, you can normalize it into a new field.

| eval common_account_name=LOWER(COALESCE(service_account_name,cn,account_name))

(`index=A) OR (index=B)
| table is_interactive account_name cn au acct_name elid full_name email_address manager_name service_account_name job_title lob pwd_expires account_type service_account_id owner_elid au_owner_name au_owner_email
| eval common_account_name=LOWER(COALESCE(service_account_name,cn,account_name))
| eventstats MAX(job_title) AS job_title MAX(lob) AS lob MAX(account_type) AS account_type MAX(owner_elid) AS owner_elid au_owner_name MAX(au_owner_name) AS au_owner_name MAX(au_owner_email) AS au_owner_email BY au common_account_name
| search account_name=*

johnhuang · ‎12-02-2021

Yes that's easy to do. Instead of MAX, use VALUES(*) AS *.

Then run this command for each mutivalued field that you want to expand/flatten.

| mvexpand field_name_1
| mvexpand field_name_2

Sukisen1981 · ‎12-02-2021

@zacksoft_wf eventstats wont give you the correct answer, eventstats over a field will return a function(like max or min) of all the events. Coming back to my question , were you able to see why some field values for eild was missing and some came in the stats values(*) query? If we can fix that our job is almost done, for then we just need to use the mvzip and(or) mvexpand functions to get the desired output.

Can you investigate a bit more on the empty/non empty values for eild?

zacksoft_wf · ‎12-02-2021

@Sukisen1981 "elid" is present for some events and absent for some.

Sukisen1981 · ‎12-02-2021

I got that, what i am asking is can you see any pattern or condition under which elid is missing and elid is coming? I mean to say, is it possible to see the raw events which are causing empty elids and events which are returning elid multivalues? Maybe something to do with the way events are being logged in the indexes

zacksoft_wf · ‎12-02-2021

@Sukisen1981 Actually , elid is not a multivalue field. user, service_account_name, service_account_id are.
elid is present is some events (not all), and that is okay for us. There is no pattern .

Sukisen1981 · ‎12-01-2021

Are you sure you want to use a join? what you need is something like this -

(index=bayseian OR index =alt)|stats values (field1, field2) by AU

Here is what I want you to do, limit your query first to 2 fields (field1 and field2 are just placeholders) and assuming AU is your common field , retrofit the above the search and run it. See the output of the above query, I am sure you will get what I am hinting at. If you manage to get all the field values from the above query,the rest of the evals, regexes can be applied later, try it out and let us know

zacksoft_wf · ‎12-01-2021

In my case , I have to search for fields in index bayseian, if the field/fields is not found (means no value), I have to check the other index, where that field might be there but with a different field name. And there is one common field between these two datasources to correlate .

Sukisen1981 · ‎12-02-2021

if that's ok then going back to your original screen shot if we can get one row for each multivalue field, does that solve your requirement?

zacksoft_wf · ‎12-02-2021

@Sukisen1981 yes

How to join two sources with common one common field ?

join

stats

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024