Splunk Search

How to dedup multivalued fields?

UMDTERPS
Communicator

Some of the data coming in from one of our indexes is doing the following( It appears data is repeating for each field):


ip                                                            User                         System
192.168.1.1 192.168.1.1            BOB BOB             ABC ABC

How can I get the data so it only shows one field value per field? (how to get it to stop repeating the same data in each field)?

ip                                 User              System
192.168.1.1             BOB             ABC

Dedup obviously won't work in this instance. 
 

Labels (3)
Tags (1)
0 Karma
1 Solution

bowesmana
Champion

You should be able to use replace+regex to change that line break to a space and then split/dedup on that, e.g.

| eval ip=mvdedup(split(replace(ip, "\n", " "), " "))

View solution in original post

scelikok
Champion

Hi @UMDTERPS,

If fields values are multivalue, you can use below workaround for a few fields. 

| eval ip=mvindex(split(ip," "),0)
| eval User=mvindex(split(User," "),0)
| eval System=mvindex(split(System," "),0)

 

If this reply helps you an upvote is appreciated.
0 Karma

UMDTERPS
Communicator

I'm still getting the same IP address repeated for each field when doing 

| eval ip=mvindex(split(ip," "),0)
| eval User=mvindex(split(User," "),0)
| eval System=mvindex(split(System," "),0)


ip
198.168.1.1
198.168.1.1


Weird. Wonder if it is something is off with the data?

0 Karma

scelikok
Champion

What if we do not split?

| eval ip=mvindex(ip,0)
| eval User=mvindex(User,0)
| eval System=mvindex(System,0)
If this reply helps you an upvote is appreciated.
0 Karma

UMDTERPS
Communicator

So, we believe the data coming in from the indexer has some sort of line break and so "Spitting" the fields won't work.  I talked to another engineer at work and he said he may require a "Regex" statement.  I'll keep this thread updated. 

0 Karma

bowesmana
Champion

You should be able to use replace+regex to change that line break to a space and then split/dedup on that, e.g.

| eval ip=mvdedup(split(replace(ip, "\n", " "), " "))

View solution in original post

UMDTERPS
Communicator

This worked!

 

| eval ip=mvdedup(split(replace(ip, "\n", " "), " "))

 

An engineer at work gave me this (yours is better):

 

|rex mode=sed "s/([0-9\.]+)\n.*/\1/g" field=ip

 

However, it only works for the ip field and you would have to create a custom regex for each field.  I will have to get with the admin to fix the data coming in.  Also, we had an issue with the data getting formatted in each field, where it made the data look like a giant column.  This was the fix:

 

|eval ip = replace(ip, "\n", " ")

 

0 Karma

scelikok
Champion

If you can provide a few sample events, we can help better.

If this reply helps you an upvote is appreciated.

FelixLeh
Path Finder

Hey, 
I'm relatively new to Splunk so I don't know if there is a more elegant way to do this but the following code should work just fine:

| makemv ip
| makemv user
| makemv system
| mvexpand ip
| mvexpand user
| mvexpand system
| dedup user ip system

This should output a row for every combination in your source excluding the duplicates.
If the fields are already multivalue then you can skip all the "Makemv" lines!

0 Karma

UMDTERPS
Communicator

Unfortunately that does not work. 🙁

0 Karma

ITWhisperer
Ultra Champion

Are you saying that the indexer has created a multivalue field with duplicate values in for some (or all?) of your events, or are these multivalue fields the result of a search query?

0 Karma

UMDTERPS
Communicator

That I'm not sure about, there could be an issues to how the data is getting in or out of the indexer.  I don't have admin rights (im not the admin), but this issue is preventing be from doing lookups and/or joins on the data with CSV's.  

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!