Hi All,
We have a saved search (snippet below) which populates a CSV lookup file. The search is scheduled to run daily.
index=xyz
...
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname), by identity
| rename latest(*) as *
| eval identity = identity+"|"+email+"|"+legacyUsername
What's happening is, every time the search runs it creates duplicates as in each subsequent row appends the results from the previous row to it for the "Identity" column as shown below.
I get that my "eval identity" command is making it do it. But how can i make it not create new records if the "identity" already exists once in the table? Hope i am clear. I tried "Dedup identity" that didn't work.
Result:
Identity | time | legacyuserName | title | enddata | first | lastname | |
1001|karen.woo@xyz.com|karen_woo | xx | xxxx | xxx | xxx | xx | Karen | Woo |
1001|karen.woo@xyz.com|karen_woo |karen.woo@xyz.com|karen_woo | xx | xxxx | xxx | xxx | xx | Karen | Woo |
1001|karen.woo@xyz.com|karen_woo |karen.woo@xyz.com|karen_woo |karen.woo@xyz.com|karen_woo | xx | xxxx | xxx | xxx | xx | Karen | Woo |
1001|karen.woo@xyz.com|karen_woo|karen.woo@xyz.com|karen_woo |karen.woo@xyz.com|karen_woo|karen.woo@xyz.com|karen_woo |karen.woo@xyz.com|karen_woo | xx | Karen | Woo |
The issue is that you are regenerating the 1001|karen.woo@xyz.com from the original as well as the previously generated version, so try repeating the stats and rename to remove the duplicate
index=xyz
...
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname), by identity
| rename latest(*) as *
| eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname), by identity
| rename latest(*) as *
| eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername
@ITWhisperer Tried that, it appears to have stop appending to previous record but the result is still showing duplicates as follows: Basically it resolved the appending piece but is still creating a new record for the same Identity even though the identity exists. How to make it not create duplicates further?
Identity | time | legacyuserName | title | enddata | first | lastname | |
1001|karen.woo@xyz.com|karen_woo | xx | xxxx | xxx | xxx | xx | Karen | Woo |
1001|karen.woo@xyz.com|karen_woo | xx | xxxx | xxx | xxx | xx | Karen | Woo |
1001|karen.woo@xyz.com|karen_woo | xx | xxxx | xxx | xxx | xx | Karen | Woo |
The issue is that you are regenerating the 1001|karen.woo@xyz.com from the original as well as the previously generated version, so try repeating the stats and rename to remove the duplicate
index=xyz
...
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname), by identity
| rename latest(*) as *
| eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername
| stats latest(_time) as _time, latest(legacyUsername), latest(title), latest(email), latest(endDate), latest(firstname), latest(lastname), by identity
| rename latest(*) as *
Thank you very much. Can you pls explain what does
eval identity=mvindex(split(identity,"|"),0)+"|"+email+"|"+legacyUsername do ?
I did lookup online on mvindex command but what exactly made you suggest split(identity,"|"),0) ?
split breaks the string up using the delimiter "|" into a multi-value field, mvindex takes one of the values, in this case, the first one since indexing begins at zero for this function. So, specifically, mvindex(split("1001|karen.woo@xyz.com|karen_woo","|"),0) equates to "1001". The line then appends the email and legacyUsername back again to recreate the identity.
Hi @neerajs_81,
the only thing I can suppose is that you have more values in the latest fields, so try to use also email and legacyUsername as key in the BY cluase, something like this:
index=xyz
...
| stats
latest(_time) as _time
latest(title) AS title
latest(endDate) AS endDate
latest(firstname) AS firstname
latest(lastname) AS lastname
by identity email legacyUsername
| eval identity = identity."|".email."|".legacyUsername
Ciao.
Giuseppe
Hi @gcusello , Thank you for responding. Tried what you suggested in the stats by clause, but it didn't make any difference to the results.
Adding @ITWhisperer .
Thanks