Solved: How to dedup multivalued fields?

UMDTERPS · ‎01-25-2021

Some of the data coming in from one of our indexes is doing the following( It appears data is repeating for each field):

ip User System
192.168.1.1 192.168.1.1 BOB BOB ABC ABC

How can I get the data so it only shows one field value per field? (how to get it to stop repeating the same data in each field)?

ip User System
192.168.1.1 BOB ABC

Dedup obviously won't work in this instance.

bowesmana · ‎01-26-2021

You should be able to use replace+regex to change that line break to a space and then split/dedup on that, e.g.

| eval ip=mvdedup(split(replace(ip, "\n", " "), " "))

View solution in original post

scelikok · ‎01-26-2021

Hi @UMDTERPS,

If fields values are multivalue, you can use below workaround for a few fields.

| eval ip=mvindex(split(ip," "),0)
| eval User=mvindex(split(User," "),0)
| eval System=mvindex(split(System," "),0)

If this reply helps you an upvote and "Accept as Solution" is appreciated.

UMDTERPS · ‎01-26-2021

I'm still getting the same IP address repeated for each field when doing

| eval ip=mvindex(split(ip," "),0)
| eval User=mvindex(split(User," "),0)
| eval System=mvindex(split(System," "),0)

ip
198.168.1.1
198.168.1.1

Weird. Wonder if it is something is off with the data?

scelikok · ‎01-26-2021

What if we do not split?

| eval ip=mvindex(ip,0)
| eval User=mvindex(User,0)
| eval System=mvindex(System,0)

If this reply helps you an upvote and "Accept as Solution" is appreciated.

UMDTERPS · ‎01-26-2021

So, we believe the data coming in from the indexer has some sort of line break and so "Spitting" the fields won't work. I talked to another engineer at work and he said he may require a "Regex" statement. I'll keep this thread updated.

bowesmana · ‎01-26-2021

You should be able to use replace+regex to change that line break to a space and then split/dedup on that, e.g.

| eval ip=mvdedup(split(replace(ip, "\n", " "), " "))

UMDTERPS · ‎01-27-2021

This worked!

| eval ip=mvdedup(split(replace(ip, "\n", " "), " "))

An engineer at work gave me this (yours is better):

|rex mode=sed "s/([0-9\.]+)\n.*/\1/g" field=ip

However, it only works for the ip field and you would have to create a custom regex for each field. I will have to get with the admin to fix the data coming in. Also, we had an issue with the data getting formatted in each field, where it made the data look like a giant column. This was the fix:

|eval ip = replace(ip, "\n", " ")

scelikok · ‎01-26-2021

If you can provide a few sample events, we can help better.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

FelixLeh · ‎01-26-2021

Hey,
I'm relatively new to Splunk so I don't know if there is a more elegant way to do this but the following code should work just fine:

| makemv ip
| makemv user
| makemv system
| mvexpand ip
| mvexpand user
| mvexpand system
| dedup user ip system

This should output a row for every combination in your source excluding the duplicates.
If the fields are already multivalue then you can skip all the "Makemv" lines!

UMDTERPS · ‎01-26-2021

Unfortunately that does not work. 🙁

ITWhisperer · ‎01-26-2021

Are you saying that the indexer has created a multivalue field with duplicate values in for some (or all?) of your events, or are these multivalue fields the result of a search query?

UMDTERPS · ‎01-26-2021

That I'm not sure about, there could be an issues to how the data is getting in or out of the indexer. I don't have admin rights (im not the admin), but this issue is preventing be from doing lookups and/or joins on the data with CSV's.

How to dedup multivalued fields?

fields

lookup

metadata

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)