Hi,
I'm not sure if this is the 'best' way, as it feels somewhat overly complicated, but I think I can achieve close to what you're looking for with the following...
This first bit of code is just to simulate your data:
| makeresults
| fields - _time
| eval Sender="xxx@gmail.com", Subject="something", Date="3/1/2017", ToRecipient="xxx@hotmail.com", CcRecipient="xxx@msn.com", BccRecipient="xxx@wa.com"
| eval CcRecipient=mvappend(CcRecipient,"xxx@yahoo.com","xxx@splunk.com")
| append
[| makeresults
| fields - _time
| eval Sender="yyy@gmail.com", Subject="something else", Date="3/1/2017", ToRecipient="yyy@hotmail.com", CcRecipient="yyy@msn.com", BccRecipient="yyy@wa.com"
| eval CcRecipient=mvappend(CcRecipient,"yyy@yahoo.com","yyy@splunk.com")]
This gives 2 results like this:
Next we need some sort of unique ID for each result. You may well have this in your original data, but if not, you can just hash some (or all) fields:
| eval email_id=sha256(Sender.Subject.Date)
Then we add the 'type' to the different address, so that when we manipulate these address later, we don't lose what track of what they were:
| rex max_match=0 mode=sed field=CcRecipient "s/(.*)/\1|Cc/g"
| rex max_match=0 mode=sed field=BccRecipient "s/(.*)/\1|Bcc/g"
| rex max_match=0 mode=sed field=ToRecipient "s/(.*)/\1|To/g"
| rex max_match=0 mode=sed field=Sender "s/(.*)/\1|Sender/g"
Next, put all of the addresses together into a single field and drop the other fields that we don't need:
| eval AllEmails=mvappend(ToRecipient,CcRecipient,BccRecipient,Sender)
| fields - ToRecipient,CcRecipient,BccRecipient,Sender,Date,Subject
Now we want to create a result for each individual email address. This is where the 'email_id' field is useful, so that we have an association between the addresses of individual emails:
| mvexpand AllEmails
And we can now create a 'Type' field to represent if the address was Sender,To,Cc or Bcc. And at the same time, tidy up the original fields:
| rex field=AllEmails "^[^|]+\|(?<Email_Type>(?:To|Cc|Bcc|Sender))"
| rex field=AllEmails mode=sed "s/([^|]+).*/\1/g"
At this point, you should now have a result for every single address.
Now the fun bit. Since you want to associate some data that you've already got, with itself, we can use an appendpipe and a bit of stats to 'duplicate' the results. At the same time, we rename a number of fields to Email2 and Email2_Type:
| appendpipe
[| stats list(AllEmails) AS Email2 list(Email_Type) AS Email2_Type list(email_id) AS email_id by AllEmails
| fields - AllEmails]
Now we use a selfjoin on the email_id to match results from the first part with the second part. Importantly we need to max=0 to ensure that there is an 'everything-to-everything' result:
| selfjoin max=0 email_id
Finally we rename some of the original fields to make more sense. And we filter out the results which didn't have a join, plus a bit of sorting:
| rename AllEmails AS Email1 Email_Type AS Email1_Type
| where isnotnull(Email2) AND Email1!=Email2
| sort + email_id,Email1,Email1_Type,Email2,Email2_Type
And finally, we should have the table which you were looking for:
At this point you can drop the email_id field as it's no longer needed.
IMPORTANT: There is still one problem with this. It doesn't yet deal with the 'inverse' duplication problem.
I.e. Where you have:
Email1 Email1_Type Email2 Email2_Type
yyy@gmail.com Sender yyy@hotmail.com To
yyy@hotmail.com To yyy@gmail.com Sender
Sorry - I ran out of time for that bit, but if I get a chance, I'll see if I can figure it out later!
Hope that helps.
Below is the full search, so it's easier to copy and paste:
| makeresults
| fields - _time
| eval Sender="xxx@gmail.com", Subject="something", Date="3/1/2017", ToRecipient="xxx@hotmail.com", CcRecipient="xxx@msn.com", BccRecipient="xxx@wa.com"
| eval CcRecipient=mvappend(CcRecipient,"xxx@yahoo.com","xxx@splunk.com")
| append
[| makeresults
| fields - _time
| eval Sender="yyy@gmail.com", Subject="something else", Date="3/1/2017", ToRecipient="yyy@hotmail.com", CcRecipient="yyy@msn.com", BccRecipient="yyy@wa.com"
| eval CcRecipient=mvappend(CcRecipient,"yyy@yahoo.com","yyy@splunk.com")]
| eval email_id=sha256(Sender.Subject.Date)
| rex max_match=0 mode=sed field=CcRecipient "s/(.*)/\1|Cc/g"
| rex max_match=0 mode=sed field=BccRecipient "s/(.*)/\1|Bcc/g"
| rex max_match=0 mode=sed field=ToRecipient "s/(.*)/\1|To/g"
| rex max_match=0 mode=sed field=Sender "s/(.*)/\1|Sender/g"
| eval AllEmails=mvappend(ToRecipient,CcRecipient,BccRecipient,Sender)
| fields - ToRecipient,CcRecipient,BccRecipient,Sender,Date,Subject
| mvexpand AllEmails
| rex field=AllEmails "^[^|]+\|(?<Email_Type>(?:To|Cc|Bcc|Sender))"
| rex field=AllEmails mode=sed "s/([^|]+).*/\1/g"
| appendpipe
[| stats list(AllEmails) AS Email2 list(Email_Type) AS Email2_Type list(email_id) AS email_id by AllEmails
| fields - AllEmails]
| selfjoin max=0 email_id
| rename AllEmails AS Email1 Email_Type AS Email1_Type
| where isnotnull(Email2) AND Email1!=Email2
| sort + email_id,Email1,Email1_Type,Email2,Email2_Type
... View more