Splunk Search

How do I extract an email address from raw data using regex?

dannili
Communicator

Hi all, I have some raw data looking like this.(just a part)

....."","10/30/2018 7:31:08 AM","10/30/2018 7:41:52 AM","natalie.someone@email.com","andrew.someone@email.com","UCCAPI/3823.323.10827.* OC/16.0.10827.20150 (Skype for Business)","UCCAPI/16.0.10730.2342342 OC/16.0.10730.20088 (Skype for Business)","","","","****-5042-5F76-A879-***7","","","","","200","[IM]","{""RequestType"":""BYE"",""RequestTime"":""2018-10-30T07:41:52.2147589"",""ContentType"":"""",""ResponseCode"":""200"",..

I want to extract two email addresses from each raw event ( natalie.someone@email.com , andrew.someone@email.com in this case) to be my two fields caller_email and receiver_email.

Does anyone know to do this? Thanks a lot!

0 Karma
1 Solution

FrankVl
Ultra Champion

Try this regex: \"(?<caller>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\",\"(?<receiver>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\"
https://regex101.com/r/wsaYMy/1/

But might be worth investing some time in defining a proper delims based extraction for the entire event.

View solution in original post

0 Karma

willymac650
New Member

I have a very similar question although I could have one, two or three email addresses in the raw data. If I use the answer below I can get results if there are exactly two email addresses .... if I modify with another duplicate regex I can get results if there are exactly three email addresses. Is there a way to get results no matter how many email addresses appear in raw data?

0 Karma

blaise
Explorer

Hi willymac650,
try this regular expression:
(("[a-zA-Z0-9_-.]+@[a-zA-Z0-9_-.]+")+),
the double brackets tells it to repeat the pattern matching, I am no expert, I just googled :"find repeat patterns in regex" and one of the pages explained this, I have tried it on regex101.com with your data and I am able to match multiple times.
The only problem now is that I don't know how to name each match using $
Hope this helps
Blaise

0 Karma

FrankVl
Ultra Champion

Try this regex: \"(?<caller>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\",\"(?<receiver>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\"
https://regex101.com/r/wsaYMy/1/

But might be worth investing some time in defining a proper delims based extraction for the entire event.

0 Karma

dannili
Communicator

This worked perfectly, thanks a lot! also suggestion noted.

0 Karma

blaise
Explorer

I have tried it on regex101.com and I think this will help you:

\s+[.]{5}"",".+?",".+?",(?".+?"),(?".+?"),

it extracts both emails and creates two fields called "email1" and "email2" to contain the result of the match.

\s+ one or more space
[.]{5} 5 dots
"", 2 double quotes characters, followed by a coma
".+?" 2 double quotes with anything inside, the ? is to make the match small (greedy?)
, a coma
".+?", same as above again
(?".+?") same as above but this time it has parentheses around, so that says that it needs to be saved, by default it would be saved into $1, but the ? part is actually naming the variable into which the matching part will be saved
, a coma
(?".+?") same as above but this time the variable is called email2
, a coma

Hope this helps
Blaise

0 Karma

FrankVl
Ultra Champion

He is only showing a fragment of his log, so \s+[.]{5} is not what it actually shows at the start of his data. That's why for my answer I just created a regex that looks for 2 consecutive valid email addresses.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In the last month, the Splunk Threat Research Team (STRT) has had 2 releases of new security content via the ...

Announcing the 1st Round Champion’s Tribute Winners of the Great Resilience Quest

We are happy to announce the 20 lucky questers who are selected to be the first round of Champion's Tribute ...

We’ve Got Education Validation!

Are you feeling it? All the career-boosting benefits of up-skilling with Splunk? It’s not just a feeling, it's ...