- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
I have a field called "User" that contains similar values and I was wondering how to remove or merge similar values?
For example: "Tony W" and "Anthony W" (both values of the same field) should be merged together.
I was looking at the fuzzy search and jellyfish apps on SplunkBase, but couldn't find a solution to the problem.
My search query:
(index="abc" Name=*) OR (index="xyz" department=* displayName=*)
| eval User=if(isnull(Name), upper(displayName), upper(Name))
| stats values(department) as department by User
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Unless you want to delve into the language processing wonderland, your best bet is to create a lookup and do it yourself. Something like
shortened | spelled |
tony | antony |
nick | nicolas |
nick | nikola |
nick | nocole |
sam | samuel |
sam | samantha |
bill | william |
will | william |
Let's call this table nicknames. Then,
(index="abc" Name=*) OR (index="xyz" department=* displayName=*)
| eval User=if(isnull(Name), lower(displayName), lower(Name)) ``` lower or upper depends on lookup table design ```
| eval User = split(User, "\s+")
| eval firstName = mvindex(User, 0), lastName = mvindex(User, -1), middleInit = if(mvcount(User) > 2, mvindex(User, 1, -2), null())
| lookup nicknames shortened AS firstName output spelled
| lookup nicknames spelled AS firstName output shortened
| lookup nicknames spelled output shortened AS shortened2 ``` handle bill and will ```
| eval firstName = mvdedup(mvappend(firstName, spelled, shortened, shortened2))
| eval firstName = upper(mvjoin(mvsort(firstName), "/")) ``` upper wouldn't be needed if lookup table is in upper ```
| stats values(department) as department values(User) by firstName middleInit lastName ``` values(User) retains original data ```
Note that I chose to list a bunch of nicknames with ambiguity. The only sensible way to handle them is to admit the uncertainty and retain data.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Unless you want to delve into the language processing wonderland, your best bet is to create a lookup and do it yourself. Something like
shortened | spelled |
tony | antony |
nick | nicolas |
nick | nikola |
nick | nocole |
sam | samuel |
sam | samantha |
bill | william |
will | william |
Let's call this table nicknames. Then,
(index="abc" Name=*) OR (index="xyz" department=* displayName=*)
| eval User=if(isnull(Name), lower(displayName), lower(Name)) ``` lower or upper depends on lookup table design ```
| eval User = split(User, "\s+")
| eval firstName = mvindex(User, 0), lastName = mvindex(User, -1), middleInit = if(mvcount(User) > 2, mvindex(User, 1, -2), null())
| lookup nicknames shortened AS firstName output spelled
| lookup nicknames spelled AS firstName output shortened
| lookup nicknames spelled output shortened AS shortened2 ``` handle bill and will ```
| eval firstName = mvdedup(mvappend(firstName, spelled, shortened, shortened2))
| eval firstName = upper(mvjoin(mvsort(firstName), "/")) ``` upper wouldn't be needed if lookup table is in upper ```
| stats values(department) as department values(User) by firstName middleInit lastName ``` values(User) retains original data ```
Note that I chose to list a bunch of nicknames with ambiguity. The only sensible way to handle them is to admit the uncertainty and retain data.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
