Splunk Search

How to reference same field name from 2 different indexes when correlating ?

neerajs_81
Builder

Hi All,

I need to correlate data from 2 different Indexes wherein the field name is common.  

Index=idx1  ( This index has general user info) 
Field Name:  sys_created_by
Value: <email id of the user>
Other fields in idx1 of interest: login_time

Index=idx2  ( This is the Index which has URLs accessed by the user)
Field Name:  sys_created_by
Value: <email id of the user>

The url  information is stored in a field called "url" in idx2.

Use case is to take the sys_created_by field from IDX1  and lookup/search for all urls  in IDX2  accessed by  the   sys_created_by coming from idx1.    I cannot rely on sys_created_by field from idx2 alone as it doesn't have all the other user attributes that are in IDX1 such as login_time.   Hence i need to correlate data across the two indexes.

Do i need to do which will merge the sys_created_by from both indexes ?

 

eval common_field = coalesce(sys_created_by, sys_created_by)

 


I tried something like :

 

(index=idx1 sys_created_by!="") OR (index=idx2 sys_created_by!="" url!="")
| stats values(url) values(login_time) BY sys_created_by

 

But this doesn't show results as expected. 
Is there a way to reference my common field like shown below in BY ;  to tell Splunk which idx it needs to refer ?

 

| stats values(url), values(login_time) BY ( idx2.sys_created_by)

 





Labels (3)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

Coalesce is not needed, there is effectively only one field 'sys_created_by' and it will have a value depending on which index it comes from

Coalesce is used in this scenario, where the field name is different, e.g.

| eval common_field=coalesce(field_from_idx1, field_from_idx2)
| stats xxxx by common_field

but you don't need that.

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

Your approach of 

(index=idx1 sys_created_by!="") OR (index=idx2 sys_created_by!="" url!="")
| stats values(url) values(login_time) BY sys_created_by

should be exactly what you need. It does not matter that the data has the same field in both indexes, splunk does not care which index the event is from. If sys_created_by="XX" it will be the same for both indexes, so your stats values command will group all urls and login times for XX

What results did it show and why was it unexpected?

 

neerajs_81
Builder

Thanks for responding.  But do i need to use the coalesce command before the stats command ? Or is the coalesce not required at all and i can directly run the stats ?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

coalesce works on different fields in the same event not the same field in different events

bowesmana
SplunkTrust
SplunkTrust

Although in the case of two data sets, it would be different fields in different events if only one of the fields is present in each data set. Thereby creating the 'join' field.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Coalesce is not needed, there is effectively only one field 'sys_created_by' and it will have a value depending on which index it comes from

Coalesce is used in this scenario, where the field name is different, e.g.

| eval common_field=coalesce(field_from_idx1, field_from_idx2)
| stats xxxx by common_field

but you don't need that.

 

neerajs_81
Builder

bowesmana  / @ITWhisperer 
what about the case, when the common_field changes values across indexes ?
For example:
In Idx1 ,  i have sys_created_by = johnk@domain.com
In idx2,   sys_created_by = some_other_user@domain.com  but user_email=johnk@domain.com 

If i want to do a stats grouping for johnk@domain.com , how to go about this ? The below coalesce command does not show desired results.  

 

 

| eval common_field= coalesce(sys_created_by, user_email) 

 

 

 
common_field ends up showing the value of some_other_user@domain.com .

But  my requirement is to grab johnk@domain.com into some common field and leverage it further to stats group by .
Pls advise.

Tags (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@neerajs_81 

You need to decide which data (john@) is important and which field that comes from.

If your data has

idx1 (sys_created_by)

idx2 (sys_created_by and user_email)

then IFF user_email contains the DATA that is important to your grouping, then as idx1 does NOT contain that field, you would use any one of these three constructs (all will produce the same outcome)

| eval group_by_field=coalesce(user_email, sys_created_by)
| eval group_by_field=if(isnotnull(user_email), user_email, sys_created_by)
| eval group_by_field=if(index="idx2", user_email, sys_created_by)

Hope this helps

 

Get Updates on the Splunk Community!

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...