Splunk Enterprise Security

How to combine rows with overlapping MV values

sonydrew
Explorer

I have data from a couple different sources that I am trying to combine together into coherent results. The issue I am running into is that sometimes the data does not line up perfectly. Both data sources will report on a user and try to list all their email aliases but sometimes they are incomplete lists and only partially overlap. So we end up with multiple rows that represent the same user but and have most of the same values for the email field, but because they are not exactly the same, when I try to group by email address it doesn't work out how I would hope.
I included some example SPL below to illustrate what the data looks like. There are also some other fields in results, but those cannot be used for merging results either as the email address of the user is the only field that is in both data sets.

| makeresults 
| eval email =split("1@example.com,2@example.com;2@example.com,3@example.com;4@example.com;5@example.com", ";")
| mvexpand email
| eval email=split(email, ",")
| streamstats count as orig_row

alt text
So I am wondering if there is any way to combine rows #1 and 2 in the example results while leaving rows 3 and 4 intact?

Thanks!

0 Karma
1 Solution

to4kawa
Ultra Champion

UPDATED:

| makeresults 
| eval email =split("1@example.com,2@example.com;2@example.com,3@example.com;4@example.com;5@example.com", ";") 
| mvexpand email 
| eval email=split(email, ",") 
| streamstats count as orig_row 
| mvexpand email 
| eventstats list(email) as target 
| streamstats current=f count as ses 
| eval target=if(ses=0, mvindex(target,1, -1),mvappend(mvindex(target,0,ses-1),mvindex(target,ses+1,-1)))
| eval checked=if(match(target,email),1,0)
| stats list(email) as email sum(checked) as checked by orig_row
| where checked > 0

View solution in original post

to4kawa
Ultra Champion

UPDATED:

| makeresults 
| eval email =split("1@example.com,2@example.com;2@example.com,3@example.com;4@example.com;5@example.com", ";") 
| mvexpand email 
| eval email=split(email, ",") 
| streamstats count as orig_row 
| mvexpand email 
| eventstats list(email) as target 
| streamstats current=f count as ses 
| eval target=if(ses=0, mvindex(target,1, -1),mvappend(mvindex(target,0,ses-1),mvindex(target,ses+1,-1)))
| eval checked=if(match(target,email),1,0)
| stats list(email) as email sum(checked) as checked by orig_row
| where checked > 0

sonydrew
Explorer

After your update the query is close. It identifies which rows have overlap, I just need to combine those correctly. I think I can manage that part. Will post back here if I get that working but will mark your answer as correct as well.

| makeresults 
| eval email =split("1@example.com,2@example.com;2@example.com,3@example.com;4@example.com;5@example.com", ";") 
| mvexpand email 
| eval email=split(email, ",") 
| streamstats count as orig_row 

| mvexpand email 
| eventstats list(email) as target 
| streamstats current=f count as ses 
| eval target=if(ses=0, mvindex(target,1, -1),mvappend(mvindex(target,0,ses-1),mvindex(target,ses+1,-1))) 
| eval checked=if(match(target,email),1,0) 
| stats list(email) as email sum(checked) as checked, count(orig_row) as oc by orig_row 
| where checked > 0 OR oc=1
0 Karma

sonydrew
Explorer

Looks like 'checked' is not getting a value for me. Perhaps it is a difference in the version of splunk I am on (7.1.6)? But I suspect that match function is not working properly with the multi-value field.

0 Karma

to4kawa
Ultra Champion
But I suspect that match function is not working properly with the multi-value field.

No. match() works. both single and multivalue.

0 Karma

to4kawa
Ultra Champion

@sonydrew
I have a mistake. my answer is updated. please confirm.

0 Karma

acfecondo75
Path Finder

If you want the full range of email values, you'll need a different field to do an exact match on. Is there a way you could create the user field (or another common value that you could group on) either with a rex or an eval? Maybe one of the email addresses is username@example.com and you could extract the user field with regex

0 Karma

sonydrew
Explorer

Well, I don't have the actual user account name available. I could rex out the user portion of the email address (everything before the @), but then I would still end up with two Multivalue rows that I am trying to combine, just with shorter strings. For example if I have primary email address of "john_smith@example.com" and an alias of "john_s@example.com", I could rex out "john_smith" and "john_s" but I don't think that helps.

0 Karma

acfecondo75
Path Finder

you could try writing the regex more specifically so it only matches the full username syntax. maybe something like this: ^(?<user>[a-z]+\_[a-z]{2,})\@ That would match john_smith@example.com but not john_s@example.com. might need to be tweaked depending on the naming conventions

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...