Solved: Case sensitivity in left join

shazbot79 · ‎01-28-2021

Hi, I have a join that is failing to pull through all users. I can see that the issue is due to case sensitivity so I tried to convert both sides of the join to lower case but this hasn't worked for all events. My SPL is as follows:

index=prod_end_user_services sourcetype=csvAOVPN 
| dedup UserName 
| eval UserName=lower(UserName) 
| rex field=UserName "(?<UserID>.*)@corp.*" 
| replace *.co.uk WITH *.com IN UserName 
| join UserName type=left 
    [ search index=prod_service_now sourcetype=snow:sys_user_list 
    | fields dv_email dv_user_name sys_id | dedup sys_id
    | rename dv_email as UserName
    | eval UserName=lower(UserName) 
    | table UserName dv_user_name ] 
| fillnull value=NULL 
| eval dv_user_name =if(match(dv_user_name,"NULL"), UserID, dv_user_name) 
| table UserName dv_user_name UserID

Is there something glaringly obvious that I have missed??

Thanks

shazbot79 · ‎01-29-2021

So....it turns out my code was behaving as expected...one of the email addresses in Service Now was incorrect. Now that it has been amended all is working as needed. Thanks

View solution in original post

shazbot79 · ‎01-29-2021

So....it turns out my code was behaving as expected...one of the email addresses in Service Now was incorrect. Now that it has been amended all is working as needed. Thanks

bowesmana · ‎01-28-2021

@shazbot79

There are often potential pitfalls using join in Splunk and it's often a good idea to start with how to use stats to achieve the same thing, which is faster and does not have the same resource limitations as join. There are row limits and timing constraints that can apply to join, which will silently occur and there is no easy way to know that the join is not complete.

However, I can see that you are not doing the replace statement inside the subsearch - could that be the issue.

As far as doing this with a single search, the following may be a suitable replacement using stats - I am not sure I have fully represented your logic correctly, but it will give you a good start.

(index=prod_end_user_services sourcetype=csvAOVPN) OR (index=prod_service_now sourcetype=snow:sys_user_list)
| eval UserName=lower(coalesce(UserName, dv_email))
| rex field=UserName "(?<UserID>.*)@corp.*" 
| eval dv_user_name = if(isnull(dv_user_name), UserID, dv_user_name) 
| stats count by UserName dv_user_name UserID
| replace *.co.uk WITH *.com IN UserName 
| fields - count

Line 2 will create the lowercase UserName field from either UserName or dv_email

The stats statement is basically doing what you were doing with the dedups

Hopefully this helps.

shazbot79 · ‎01-28-2021

Hi, thanks for this. It doesn't achieve what I need. What I need is the complete list of UserName from the csvAOVPN sourcetype. I need to replace co.uk with .com on this field and convert to lowercase. I need to strip out the UserID where the UserName contains @corp. Then for all UserNames that don't have @corp I need to return the userID (dv_user_name) from service now by using the UserName as a lookup on the email address in ServiceNow.

What your code seems to do is bring back the 22 UserNames that have @corp only.

I confess I really can't get my head around how using stats to replace join works. I've tried playing around with your code but can't get it to do anything close to what I need. Could you explain how it works?

Thanks

bowesmana · ‎01-29-2021

@shazbot79

I know you've got your solution, but in terms of the principles behind stats vs join. Consider these two data sets

Set 1

UserName,Field2
user1@abc.co.uk,aaaa
user2@abc.co.uk,bbbb
user2@abc.co.uk,cccc

and then this data set 2

UserName,Fieldx
user1@abc.co.uk,xxxx
user2@abc.co.uk,yyyy
user2@abc.co.uk,zzzz

so, when you have a common field, you can either do the following join

DataSet1 
| join UserName [
  DataSet2
  | Fields UserName Field3
]
| stats values(*) as * by UserName

or using stats

DataSet1 OR DataSet2
| stats values(*) as * by UserName

Will achieve the same thing, but without the cost/limitations of using a subsearch.

Case sensitivity in left join

Other

saved search

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

Logs to Metrics

Developer Spotlight with Paul Stout