HR data I'm working with has multiple entries for the same user. The hr_id always starts with an Alpha character
followed by from 5-7 numeric characters. The Lan_name varies with no discernable structure. Sometimes, the
Lan_name matches the hr_id. The hr_id is always consistent for each user. The Lan_name value is not always
consistent. Unfortunately, some users show both the hr_id and Lan_name in these multiple entries.
The following is an example of my data:
full_name Job_Title Email Lan_name hr_id
Smith,Tom job1 tsmith@domain.com ts004 S12345
Smith,Tom job1 tsmith@domain.com ts004 S12345
Smith,Tom job1 tsmith@domain.com S12345 S12345
Smith,Tom job1 tsmith@domain.com S12345 S12345
Jones,Jill job2 jjones@domain.com j723b2 j1234567
Jones,Jill job2 jjones@domain.com j1234567 j1234567
Because I dedup both the Lan_name and the hr_id, sometimes I get the preferred
full_name Job_Title Email Lan_name hr_id
Smith,Tom job1 tsmith@domain.com ts004 S12345
But sometimes I get the row that the Lan_name is the same as the hr_id.
full_name Job_Title Email Lan_name hr_id
Smith,Tom job1 tsmith@domain.com S12345 S12345
Keep in mind, sometimes the Lan_name and the hr_id is the same due to users that are newer to the organization.
Can someone could show me how use the Lan_name for Lan_name when it is dissimilar to the hr_id? I can't request
HR make corrections due to the sheer volume of entries.
Below is the sample sample search. I removed most of the noise leaving the full_name evals and regex because of
other HR data input that is not accurate. I can't use subsearches because I'm using a REST call from a windows
application. I would like to avoid lookup tables if at all possible. Finally, I can't make any changes to any of
configuration files on the Splunk servers.
index="hr_index" sourcetype="hr_user_accounts"
| rename job_title AS Job_Title, email_address AS Email, hr_id as HR_ID
| eval full_name=replace(full_name,"\."," ")
| eval full_name=replace(full_name," "," ")
| eval full_name=replace(full_name," "," ")
| rex field=full_name "^(?P<full_name>[\d|\s]+)"
| eval tmp=split(full_name,",")
| eval Last_Name=mvindex(tmp,0),First_Name=mvindex(tmp,1)
| table Last_Name, First_Name, Job_Title, Email, Lan_name, HR_ID
... View more