Splunk Search

How to find duplicate values in a multivalue field?

jovi
New Member

Hi
I have logs in Splunk containing lines like this: UserPolicies=13=5|0=81540803|7=137|9=76|13=3|1=11|21=10
UserPolicies are multivalue. The first numer (before=) represents a type and the last number (after=) represents a name, and they are separated by |.

How do i find duplicate values (ie 13 in the example above) in the UserPolicies field?

I'm able to separate the values using makemv but are not able to further narrow my search to only list entries with duplicate type values..

0 Karma
1 Solution

DalJeanis
Legend

IMPROVED VERSION

This one is more efficient than the one I posted below. It dedups the type mv, and then if it doesn't have the same number of policies, you know one was a dup.

| makeresults | eval UserID="blah" | eval UserPolicies="13=5|0=81540803|7=137|9=76|13=3|1=11|21=10" 

| makemv UserPolicies delim="|" 
| rex field=UserPolicies "(?<policyType>[^=]+)=(?<policyValues>.+)" 
| eval policyType=mvdedup(policyType)
| where mvcount(policyType)!=mvcount(policyValues)

You don't know WHICH one was a dup, but you can then run only the users that have dups through the other version of the code, as follows.

 | eval policyType=""
 | rex field=UserPolicies "(?<policyType>[^=]+)=" 
 | mvexpand policyType 
 | stats values(*) as * count as dupcount by UserID policyType 
 | where dupcount>1

OLDER VERSION

Here's an example of one method, The first line just creates the test data.

| makeresults | eval UserID="blah" | eval UserPolicies="13=5|0=81540803|7=137|9=76|13=3|1=11|21=10" 

| makemv UserPolicies delim="|" 
| rex field=UserPolicies "(?<policyType>[^=]+)=" 
| mvexpand policyType 
| stats values(*) as * count as dupcount by UserID policyType 
| where dupcount>1

giving results that look like this

UserID        policyType    UserPolicies  dupcount      
blah          13            0=81540803    2             
                            13=3                        
                            13=5                        
                            1=11                        
                            21=10                       
                            7=137                       
                            9=76                        

Note that the stats/values command has sorted the order of the policyType pairs.

View solution in original post

0 Karma

DalJeanis
Legend

IMPROVED VERSION

This one is more efficient than the one I posted below. It dedups the type mv, and then if it doesn't have the same number of policies, you know one was a dup.

| makeresults | eval UserID="blah" | eval UserPolicies="13=5|0=81540803|7=137|9=76|13=3|1=11|21=10" 

| makemv UserPolicies delim="|" 
| rex field=UserPolicies "(?<policyType>[^=]+)=(?<policyValues>.+)" 
| eval policyType=mvdedup(policyType)
| where mvcount(policyType)!=mvcount(policyValues)

You don't know WHICH one was a dup, but you can then run only the users that have dups through the other version of the code, as follows.

 | eval policyType=""
 | rex field=UserPolicies "(?<policyType>[^=]+)=" 
 | mvexpand policyType 
 | stats values(*) as * count as dupcount by UserID policyType 
 | where dupcount>1

OLDER VERSION

Here's an example of one method, The first line just creates the test data.

| makeresults | eval UserID="blah" | eval UserPolicies="13=5|0=81540803|7=137|9=76|13=3|1=11|21=10" 

| makemv UserPolicies delim="|" 
| rex field=UserPolicies "(?<policyType>[^=]+)=" 
| mvexpand policyType 
| stats values(*) as * count as dupcount by UserID policyType 
| where dupcount>1

giving results that look like this

UserID        policyType    UserPolicies  dupcount      
blah          13            0=81540803    2             
                            13=3                        
                            13=5                        
                            1=11                        
                            21=10                       
                            7=137                       
                            9=76                        

Note that the stats/values command has sorted the order of the policyType pairs.

0 Karma

jovi
New Member

Hi
Thanks for the quick reply!

I was trying something like this
| makemv delim="|" UserPolicies
| rex field=UserPolicies mode=sed "s/=.{1,9}//"
but wasnt able to further evaluate the duplicate entries..

Both of your suggestions works, but do you know of a way to list out only the affected userid and the duplicated policytype?

0 Karma

aaraneta_splunk
Splunk Employee
Splunk Employee

@jovi - If the answer provided by DalJeanis provided a working solution to your original question, please don't forget to resolve this post by clicking "Accept". Thank you.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...