Splunk Search

How do I extract an email address from raw data using regex?

dannili
Communicator

Hi all, I have some raw data looking like this.(just a part)

....."","10/30/2018 7:31:08 AM","10/30/2018 7:41:52 AM","natalie.someone@email.com","andrew.someone@email.com","UCCAPI/3823.323.10827.* OC/16.0.10827.20150 (Skype for Business)","UCCAPI/16.0.10730.2342342 OC/16.0.10730.20088 (Skype for Business)","","","","****-5042-5F76-A879-***7","","","","","200","[IM]","{""RequestType"":""BYE"",""RequestTime"":""2018-10-30T07:41:52.2147589"",""ContentType"":"""",""ResponseCode"":""200"",..

I want to extract two email addresses from each raw event ( natalie.someone@email.com , andrew.someone@email.com in this case) to be my two fields caller_email and receiver_email.

Does anyone know to do this? Thanks a lot!

0 Karma
1 Solution

FrankVl
Ultra Champion

Try this regex: \"(?<caller>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\",\"(?<receiver>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\"
https://regex101.com/r/wsaYMy/1/

But might be worth investing some time in defining a proper delims based extraction for the entire event.

View solution in original post

0 Karma

willymac650
New Member

I have a very similar question although I could have one, two or three email addresses in the raw data. If I use the answer below I can get results if there are exactly two email addresses .... if I modify with another duplicate regex I can get results if there are exactly three email addresses. Is there a way to get results no matter how many email addresses appear in raw data?

0 Karma

blaise
Explorer

Hi willymac650,
try this regular expression:
(("[a-zA-Z0-9_-.]+@[a-zA-Z0-9_-.]+")+),
the double brackets tells it to repeat the pattern matching, I am no expert, I just googled :"find repeat patterns in regex" and one of the pages explained this, I have tried it on regex101.com with your data and I am able to match multiple times.
The only problem now is that I don't know how to name each match using $
Hope this helps
Blaise

0 Karma

FrankVl
Ultra Champion

Try this regex: \"(?<caller>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\",\"(?<receiver>[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5})\"
https://regex101.com/r/wsaYMy/1/

But might be worth investing some time in defining a proper delims based extraction for the entire event.

0 Karma

dannili
Communicator

This worked perfectly, thanks a lot! also suggestion noted.

0 Karma

blaise
Explorer

I have tried it on regex101.com and I think this will help you:

\s+[.]{5}"",".+?",".+?",(?".+?"),(?".+?"),

it extracts both emails and creates two fields called "email1" and "email2" to contain the result of the match.

\s+ one or more space
[.]{5} 5 dots
"", 2 double quotes characters, followed by a coma
".+?" 2 double quotes with anything inside, the ? is to make the match small (greedy?)
, a coma
".+?", same as above again
(?".+?") same as above but this time it has parentheses around, so that says that it needs to be saved, by default it would be saved into $1, but the ? part is actually naming the variable into which the matching part will be saved
, a coma
(?".+?") same as above but this time the variable is called email2
, a coma

Hope this helps
Blaise

0 Karma

FrankVl
Ultra Champion

He is only showing a fragment of his log, so \s+[.]{5} is not what it actually shows at the start of his data. That's why for my answer I just created a regex that looks for 2 consecutive valid email addresses.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...