<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Define results when duplicate events have dissimilar field values in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464834#M130978</link>
    <description>&lt;P&gt;HR data I'm working with has multiple entries for the same user. The hr_id always starts with an Alpha character&lt;BR /&gt;
followed by from 5-7 numeric characters. The Lan_name varies with no discernable structure. Sometimes, the &lt;BR /&gt;
Lan_name matches the hr_id. The hr_id is always consistent for each user. The Lan_name value is not always &lt;BR /&gt;
consistent. Unfortunately, some users show both the hr_id and Lan_name in these multiple entries. &lt;BR /&gt;
The following is an example of my data:&lt;/P&gt;

&lt;P&gt;full_name   Job_Title   Email               Lan_name    hr_id&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   ts004       S12345&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   ts004       S12345&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   S12345      S12345&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   S12345      S12345&lt;BR /&gt;
Jones,Jill      job2        &lt;A href="mailto:jjones@domain.com" target="_blank"&gt;jjones@domain.com&lt;/A&gt;   j723b2      j1234567&lt;BR /&gt;
Jones,Jill          job2        &lt;A href="mailto:jjones@domain.com" target="_blank"&gt;jjones@domain.com&lt;/A&gt;   j1234567    j1234567&lt;/P&gt;

&lt;P&gt;Because I dedup both the Lan_name and the hr_id, sometimes I get the preferred&lt;/P&gt;

&lt;P&gt;full_name   Job_Title        Email                      Lan_name    hr_id&lt;BR /&gt;
Smith,Tom   job1             &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;  ts004       S12345&lt;/P&gt;

&lt;P&gt;But sometimes I get the row that the Lan_name is the same as the hr_id. &lt;/P&gt;

&lt;P&gt;full_name   Job_Title        Email                      Lan_name    hr_id&lt;BR /&gt;
Smith,Tom   job1             &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;  S12345      S12345&lt;/P&gt;

&lt;P&gt;Keep in mind, sometimes the Lan_name and the hr_id is the same due to users that are newer to the organization. &lt;BR /&gt;
Can someone could show me how use the Lan_name for Lan_name when it is dissimilar to the hr_id? I can't request &lt;BR /&gt;
HR make corrections due to the sheer volume of entries.&lt;/P&gt;

&lt;P&gt;Below is the sample sample search. I removed most of the noise leaving the full_name evals and regex because of &lt;BR /&gt;
other HR data input that is not accurate. I can't use subsearches because I'm using a REST call from a windows &lt;BR /&gt;
application. I would like to avoid lookup tables if at all possible. Finally, I can't make any changes to any of &lt;BR /&gt;
configuration files on the Splunk servers. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="hr_index" sourcetype="hr_user_accounts" 
| rename job_title AS Job_Title, email_address AS Email, hr_id as HR_ID
| eval full_name=replace(full_name,"\."," ")
| eval full_name=replace(full_name,"   "," ")
| eval full_name=replace(full_name,"  "," ")
| rex field=full_name "^(?P&amp;lt;full_name&amp;gt;[\d|\s]+)"
| eval tmp=split(full_name,",")
| eval Last_Name=mvindex(tmp,0),First_Name=mvindex(tmp,1) 
| table Last_Name, First_Name, Job_Title, Email, Lan_name, HR_ID
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Wed, 30 Sep 2020 03:21:37 GMT</pubDate>
    <dc:creator>dorgra</dc:creator>
    <dc:date>2020-09-30T03:21:37Z</dc:date>
    <item>
      <title>Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464834#M130978</link>
      <description>&lt;P&gt;HR data I'm working with has multiple entries for the same user. The hr_id always starts with an Alpha character&lt;BR /&gt;
followed by from 5-7 numeric characters. The Lan_name varies with no discernable structure. Sometimes, the &lt;BR /&gt;
Lan_name matches the hr_id. The hr_id is always consistent for each user. The Lan_name value is not always &lt;BR /&gt;
consistent. Unfortunately, some users show both the hr_id and Lan_name in these multiple entries. &lt;BR /&gt;
The following is an example of my data:&lt;/P&gt;

&lt;P&gt;full_name   Job_Title   Email               Lan_name    hr_id&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   ts004       S12345&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   ts004       S12345&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   S12345      S12345&lt;BR /&gt;
Smith,Tom   job1        &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;   S12345      S12345&lt;BR /&gt;
Jones,Jill      job2        &lt;A href="mailto:jjones@domain.com" target="_blank"&gt;jjones@domain.com&lt;/A&gt;   j723b2      j1234567&lt;BR /&gt;
Jones,Jill          job2        &lt;A href="mailto:jjones@domain.com" target="_blank"&gt;jjones@domain.com&lt;/A&gt;   j1234567    j1234567&lt;/P&gt;

&lt;P&gt;Because I dedup both the Lan_name and the hr_id, sometimes I get the preferred&lt;/P&gt;

&lt;P&gt;full_name   Job_Title        Email                      Lan_name    hr_id&lt;BR /&gt;
Smith,Tom   job1             &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;  ts004       S12345&lt;/P&gt;

&lt;P&gt;But sometimes I get the row that the Lan_name is the same as the hr_id. &lt;/P&gt;

&lt;P&gt;full_name   Job_Title        Email                      Lan_name    hr_id&lt;BR /&gt;
Smith,Tom   job1             &lt;A href="mailto:tsmith@domain.com" target="_blank"&gt;tsmith@domain.com&lt;/A&gt;  S12345      S12345&lt;/P&gt;

&lt;P&gt;Keep in mind, sometimes the Lan_name and the hr_id is the same due to users that are newer to the organization. &lt;BR /&gt;
Can someone could show me how use the Lan_name for Lan_name when it is dissimilar to the hr_id? I can't request &lt;BR /&gt;
HR make corrections due to the sheer volume of entries.&lt;/P&gt;

&lt;P&gt;Below is the sample sample search. I removed most of the noise leaving the full_name evals and regex because of &lt;BR /&gt;
other HR data input that is not accurate. I can't use subsearches because I'm using a REST call from a windows &lt;BR /&gt;
application. I would like to avoid lookup tables if at all possible. Finally, I can't make any changes to any of &lt;BR /&gt;
configuration files on the Splunk servers. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="hr_index" sourcetype="hr_user_accounts" 
| rename job_title AS Job_Title, email_address AS Email, hr_id as HR_ID
| eval full_name=replace(full_name,"\."," ")
| eval full_name=replace(full_name,"   "," ")
| eval full_name=replace(full_name,"  "," ")
| rex field=full_name "^(?P&amp;lt;full_name&amp;gt;[\d|\s]+)"
| eval tmp=split(full_name,",")
| eval Last_Name=mvindex(tmp,0),First_Name=mvindex(tmp,1) 
| table Last_Name, First_Name, Job_Title, Email, Lan_name, HR_ID
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:21:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464834#M130978</guid>
      <dc:creator>dorgra</dc:creator>
      <dc:date>2020-09-30T03:21:37Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464835#M130979</link>
      <description>&lt;P&gt;Interesting problem! The answer I came up with was essentially taking all values of Lan_name and hr_id (since &lt;CODE&gt;dedup&lt;/CODE&gt; has a lot of the same functionality as &lt;CODE&gt;stats&lt;/CODE&gt; at it's base use-case), expanding all options (when there are more than 1), and then filtering those that are both duplicates and have the same hr_id as lan_name. Example below:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults count=4              
| streamstats count                 
| eval Lan_name=case(count=1 OR count=2, "ts004", count=3, "j723b2", count=4, "j1234567")        
| eval hr_id=case(count=1 OR count=2, "S12345", count=3 OR count=4, "j1234567")       
| eval full_name=case(count=1 OR count=2, "TSmith", count=3 OR count=4, "JJones")        
| stats values(Lan_name) as Lan_name, values(hr_id) as hr_id by full_name 
| eval countOfLan_name=mvcount(Lan_name) 
| mvexpand Lan_name 
| eval duplicate=if(countOfLan_name&amp;gt;1 AND Lan_name=hr_id, 1, 0) 
| search duplicate=0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The first 5 lines are for data creation after which, each line does:&lt;BR /&gt;
6. Gets unique list of Lan_name/hr_id by "group term", in this case full_name, in your case, whatever is unique&lt;BR /&gt;
7. Figure out if there are multiple Lan_names&lt;BR /&gt;
8. Split out Lan_names, keeping everything else consistent&lt;BR /&gt;
9. Determine if something is a duplicate by saying "are there two rows in my entry? Yes? Am i the same Lan_name and hr_id?"&lt;BR /&gt;
10. Filter duplicates out&lt;BR /&gt;
I made a few assumptions (that in the case where there are multiple Lan_names, you always want the one that is different than hr_id), and that Lan_names is always populated. If Lan_names is not always populated, &lt;CODE&gt;mvexpand&lt;/CODE&gt; will filter that row out, so add in a &lt;CODE&gt;fillnull Lan_names value="null"&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:18:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464835#M130979</guid>
      <dc:creator>aberkow</dc:creator>
      <dc:date>2020-09-30T03:18:49Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464836#M130980</link>
      <description>&lt;P&gt;Any suggestion on how to pass the values for Lan_name and hr_id to rows 3, 4 and 5? I'm liking this direction. Thanks for getting to it so quickly. &lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:22:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464836#M130980</guid>
      <dc:creator>dorgra</dc:creator>
      <dc:date>2020-09-30T03:22:00Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464837#M130981</link>
      <description>&lt;P&gt;I don't follow your question. What are rows 3, 4 and 5 to you? &lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2019 23:10:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464837#M130981</guid>
      <dc:creator>aberkow</dc:creator>
      <dc:date>2019-12-11T23:10:29Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464838#M130982</link>
      <description>&lt;P&gt;the 3 eval rows |\&lt;BR /&gt;
    | eval Lan_name=case(count=1 OR count=2, "ts004", count=3, "j723b2", count=4, "j1234567")&lt;BR /&gt;&lt;BR /&gt;
    | eval hr_id=case(count=1 OR count=2, "S12345", count=3 OR count=4, "j1234567")&lt;BR /&gt;&lt;BR /&gt;
    | eval full_name=case(count=1 OR count=2, "TSmith", count=3 OR count=4, "JJones")    &lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:22:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464838#M130982</guid>
      <dc:creator>dorgra</dc:creator>
      <dc:date>2020-09-30T03:22:03Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464839#M130983</link>
      <description>&lt;P&gt;Oh. That's just my data creation to recreate what you had done above to get to the state you said your output was. If you slap on lines 6-10 to your query that should solve your problem.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 01:48:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464839#M130983</guid>
      <dc:creator>aberkow</dc:creator>
      <dc:date>2019-12-12T01:48:42Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464840#M130984</link>
      <description>&lt;P&gt;This has been very helpful. I can't figure out why this search leaves Job_Title and Email blank. It also shows full_name instead of Last_Name and First_Name. I'm going to go ahead and award the points. You've earned them and you replied so quickly. Very much appreciated. I'm sure the remaining problem is something basic that I'm missing. &lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:22:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464840#M130984</guid>
      <dc:creator>dorgra</dc:creator>
      <dc:date>2020-09-30T03:22:14Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464841#M130985</link>
      <description>&lt;P&gt;Unfortunately, I don't have enough Karma points to award points. Odd that. &lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 15:42:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464841#M130985</guid>
      <dc:creator>dorgra</dc:creator>
      <dc:date>2019-12-12T15:42:52Z</dc:date>
    </item>
    <item>
      <title>Re: Define results when duplicate events have dissimilar field values</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464842#M130986</link>
      <description>&lt;P&gt;Accepting and upvoting the answer is more than enough &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; The answer to why that is happening is one of the most common Splunk bugs people run into - transforming commands (such as &lt;CODE&gt;stats&lt;/CODE&gt; are used by Splunk to optimize raw information (row logs) into tabled information). Therefore, you need to pass any fields you want later in your search through that line. Check out some of the functions available &lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.0/SearchReference/Stats#Stats_function_options"&gt;https://docs.splunk.com/Documentation/Splunk/8.0.0/SearchReference/Stats#Stats_function_options&lt;/A&gt;. If it's a single value field (1 to 1 mapping of thing to unique bucket in your split by), list and values return the same thing. Otherwise, list returns the full list, and values, the unique set of. I used values because I wanted all unique values, and you probably want to do the same! Try out the syntax and let me know if you run into issues.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 16:49:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Define-results-when-duplicate-events-have-dissimilar-field/m-p/464842#M130986</guid>
      <dc:creator>aberkow</dc:creator>
      <dc:date>2019-12-12T16:49:54Z</dc:date>
    </item>
  </channel>
</rss>

