Hi All,
I need to correlate data from 2 different Indexes wherein the field name is common.
Index=idx1 ( This index has general user info)
Field Name: sys_created_by
Value: <email id of the user>
Other fields in idx1 of interest: login_time
Index=idx2 ( This is the Index which has URLs accessed by the user)
Field Name: sys_created_by
Value: <email id of the user>
The url information is stored in a field called "url" in idx2.
Use case is to take the sys_created_by field from IDX1 and lookup/search for all urls in IDX2 accessed by the sys_created_by coming from idx1. I cannot rely on sys_created_by field from idx2 alone as it doesn't have all the other user attributes that are in IDX1 such as login_time. Hence i need to correlate data across the two indexes.
Do i need to do which will merge the sys_created_by from both indexes ?
eval common_field = coalesce(sys_created_by, sys_created_by)
I tried something like :
(index=idx1 sys_created_by!="") OR (index=idx2 sys_created_by!="" url!="")
| stats values(url) values(login_time) BY sys_created_by
But this doesn't show results as expected.
Is there a way to reference my common field like shown below in BY ; to tell Splunk which idx it needs to refer ?
| stats values(url), values(login_time) BY ( idx2.sys_created_by)
Coalesce is not needed, there is effectively only one field 'sys_created_by' and it will have a value depending on which index it comes from
Coalesce is used in this scenario, where the field name is different, e.g.
| eval common_field=coalesce(field_from_idx1, field_from_idx2)
| stats xxxx by common_field
but you don't need that.
Your approach of
(index=idx1 sys_created_by!="") OR (index=idx2 sys_created_by!="" url!="")
| stats values(url) values(login_time) BY sys_created_by
should be exactly what you need. It does not matter that the data has the same field in both indexes, splunk does not care which index the event is from. If sys_created_by="XX" it will be the same for both indexes, so your stats values command will group all urls and login times for XX
What results did it show and why was it unexpected?
Thanks for responding. But do i need to use the coalesce command before the stats command ? Or is the coalesce not required at all and i can directly run the stats ?
coalesce works on different fields in the same event not the same field in different events
Although in the case of two data sets, it would be different fields in different events if only one of the fields is present in each data set. Thereby creating the 'join' field.
Coalesce is not needed, there is effectively only one field 'sys_created_by' and it will have a value depending on which index it comes from
Coalesce is used in this scenario, where the field name is different, e.g.
| eval common_field=coalesce(field_from_idx1, field_from_idx2)
| stats xxxx by common_field
but you don't need that.
bowesmana / @ITWhisperer
what about the case, when the common_field changes values across indexes ?
For example:
In Idx1 , i have sys_created_by = johnk@domain.com
In idx2, sys_created_by = some_other_user@domain.com but user_email=johnk@domain.com
If i want to do a stats grouping for johnk@domain.com , how to go about this ? The below coalesce command does not show desired results.
| eval common_field= coalesce(sys_created_by, user_email)
common_field ends up showing the value of some_other_user@domain.com .
But my requirement is to grab johnk@domain.com into some common field and leverage it further to stats group by .
Pls advise.
You need to decide which data (john@) is important and which field that comes from.
If your data has
idx1 (sys_created_by)
idx2 (sys_created_by and user_email)
then IFF user_email contains the DATA that is important to your grouping, then as idx1 does NOT contain that field, you would use any one of these three constructs (all will produce the same outcome)
| eval group_by_field=coalesce(user_email, sys_created_by)
| eval group_by_field=if(isnotnull(user_email), user_email, sys_created_by)
| eval group_by_field=if(index="idx2", user_email, sys_created_by)
Hope this helps