Solved: How to make a Search NOT append results from previ...

neerajs_81 · ‎02-13-2022

Hi All,

We have a saved search (snippet below) which populates a CSV lookup file. The search is scheduled to run daily.

index=xyz
...
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname),  by identity
| rename latest(*) as *
| eval identity = identity+"|"+email+"|"+legacyUsername

What's happening is, every time the search runs it creates duplicates as in each subsequent row appends the results from the previous row to it for the "Identity" column as shown below.
I get that my "eval identity" command is making it do it. But how can i make it not create new records if the "identity" already exists once in the table? Hope i am clear. I tried "Dedup identity" that didn't work.

Result:

Identity	time	legacyuserName	title	email	enddata	first	lastname
1001\|karen.woo@xyz.com\|karen_woo	xx	xxxx	xxx	xxx	xx	Karen	Woo
1001\|karen.woo@xyz.com\|karen_woo \|karen.woo@xyz.com\|karen_woo	xx	xxxx	xxx	xxx	xx	Karen	Woo
1001\|karen.woo@xyz.com\|karen_woo \|karen.woo@xyz.com\|karen_woo \|karen.woo@xyz.com\|karen_woo	xx	xxxx	xxx	xxx	xx	Karen	Woo
1001\|karen.woo@xyz.com\|karen_woo\|karen.woo@xyz.com\|karen_woo \|karen.woo@xyz.com\|karen_woo\|karen.woo@xyz.com\|karen_woo \|karen.woo@xyz.com\|karen_woo					xx	Karen	Woo

ITWhisperer · ‎02-13-2022

The issue is that you are regenerating the 1001|karen.woo@xyz.com from the original as well as the previously generated version, so try repeating the stats and rename to remove the duplicate

index=xyz
...
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname),  by identity
| rename latest(*) as *
| eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname),  by identity
| rename latest(*) as *

View solution in original post

ITWhisperer · ‎02-13-2022

| eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername

neerajs_81 · ‎02-13-2022

@ITWhisperer Tried that, it appears to have stop appending to previous record but the result is still showing duplicates as follows: Basically it resolved the appending piece but is still creating a new record for the same Identity even though the identity exists. How to make it not create duplicates further?

Identity	time	legacyuserName	title	email	enddata	first	lastname
1001\|karen.woo@xyz.com\|karen_woo	xx	xxxx	xxx	xxx	xx	Karen	Woo
1001\|karen.woo@xyz.com\|karen_woo	xx	xxxx	xxx	xxx	xx	Karen	Woo
1001\|karen.woo@xyz.com\|karen_woo	xx	xxxx	xxx	xxx	xx	Karen	Woo

ITWhisperer · ‎02-13-2022

The issue is that you are regenerating the 1001|karen.woo@xyz.com from the original as well as the previously generated version, so try repeating the stats and rename to remove the duplicate

index=xyz
...
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname),  by identity
| rename latest(*) as *
| eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname),  by identity
| rename latest(*) as *

neerajs_81 · ‎02-14-2022

Thank you very much. Can you pls explain what does
eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername do ?

I did lookup online on mvindex command but what exactly made you suggest split(identity,"|"),0) ?

ITWhisperer · ‎02-14-2022

split breaks the string up using the delimiter "|" into a multi-value field, mvindex takes one of the values, in this case, the first one since indexing begins at zero for this function. So, specifically, mvindex(split("1001|karen.woo@xyz.com|karen_woo","|"),0) equates to "1001". The line then appends the email and legacyUsername back again to recreate the identity.

gcusello · ‎02-13-2022

Hi @neerajs_81,

the only thing I can suppose is that you have more values in the latest fields, so try to use also email and legacyUsername as key in the BY cluase, something like this:

index=xyz
...
| stats 
   latest(_time) as _time
   latest(title) AS title
   latest(endDate) AS endDate
   latest(firstname) AS firstname
   latest(lastname) AS lastname
   by identity email legacyUsername
| eval identity = identity."|".email."|".legacyUsername

Ciao.

Giuseppe

neerajs_81 · ‎02-13-2022

Hi @gcusello , Thank you for responding. Tried what you suggested in the stats by clause, but it didn't make any difference to the results.

neerajs_81 · ‎02-13-2022

Adding @ITWhisperer .
Thanks

isoutamo · ‎02-13-2022

Can you just check if identity in not null/“” (empty) and then add those to it?

How to make a Search NOT append results from previous row if record already exists ?

fields

stats

subsearch

What the End of Support for Splunk Add-on Builder Means for You

Solve, Learn, Repeat: New Puzzle Channel Now Live

Building Reliable Asset and Identity Frameworks in Splunk ES

Are you a member of the Splunk Community?