How to search for and extract Email IDs with dot t...

kamaleshwar · ‎12-27-2015

I would like to know whether there is any possibility of extracting or getting the Email IDs with dot trend patterns.

(E.g) search.splunk@gmail.com
searchs.plunk@gmail.com
searchsp.lunk@gmail.com and so on...

Thanks in advance!

IrenSari · ‎10-02-2016

I am using a soft https://www.atompark.com/ for extract the available addresses.

kamaleshwar · ‎12-31-2015

I've written the rex command to pull the emails. But actually as per my REX command it will pull only the email having one dot, i need its to pull the email with more than one dot also before the domain names. Here is my rex cmd. Help me on this. Can we add OR condition here? Is it would work in REX?

rex "^(?:[^:\n]*:){5}\s+(?P\w+\.\w+@\w+\.\w+)"

woodcock · ‎12-31-2015

It appears that you are unaware of google's implementation of infinite email aliases. For gmail (and possibly other email systems), the username cannot contain periods or plus-signs when you create your gmail ID. The reason for this rule is because those characters are how email aliases are allowed. Gmail strips out all periods and anything after a plus sign. See this blog for details:

http://gmailblog.blogspot.com/2008/03/2-hidden-ways-to-get-more-from-your.html

So for gmail, you need to normalize usernames the same way, like this:

... | rex field=email "^(?<username>[^@]*)@(?<domain>.*)$" | username = if(domain="gmail.com", replace(replace(username, "\.", ""), "+.*", "") , username)

kamaleshwar · ‎12-31-2015

This one is okay, but i don't need it as i've written the rex command to pull the emails. But actually as per my REX command it will pull only the email having one dot, i need its to pull the email with more than one dot also before the domain names. Here is my rex cmd. Help me on this. Can we add OR condition here? Is it would work in REX?

rex "^(?:[^:\n]*:){5}\s+(?P<email_dot>\w+\.\w+@\w+\.\w+)"

woodcock · ‎01-05-2016

Ditch your rex and use mine (assuming that there is a field that contains the entire email address).

sttang88 · ‎12-30-2015

Hi there!

What about the following snippet?

<your_search>
| rex field=<email_field> "^(?P<email_user>.*)@(?P<email_domain>.*)$"
| rex field=email_user mode=sed "s/\.//g"
| rex field=email_user mode=sed "s/\+.*$//g"
| eval sanitized_email=email_user . "@" . email_domain
| table <email_field>, sanitized_email

Should compute a sanitized version of the email address in a new field named sanitized_email.

Regards,

kamaleshwar · ‎12-30-2015

Thanks! Its Cool but its fetching the sanitized email from the dot trend emails.

E.x

Actual Email with Dot trends

h.au.517@gmail[.]com 
 h.au5.17@gmail[.]com
 h.au51.7@gmail[.]com 
ha.u.517@gmail[.]com 
ha.u51.7@gmail[.]com 
ha.u517@gmail[.]com 
ha.u5.17@gmail[.]com

Sanitized Email

hau517@gmail[.]com

Actually i need the dot trend alone not the sanitized one.

Anyway thanks for this one!

jplumsdaine22 · ‎12-30-2015

First of all those gmail addresses you posted? They are all the same mailbox (see https://support.google.com/mail/answer/10313?hl=en).
You could split the email up as follows (including the domain as sundareshr pointed out)

| rex field=emailfield (?<namePartA>.*?)\.(?<namePartB>.*?)@(?<domain>.*)

You then will have three new field - namePartA (before the dot), namePartB and domain. You can arrange these as you wish to capture the patterns. (eg | stats values(namePartB) by Name PartA) but I'm not sure how helpful that is going to be.

Instead I would have a play with the cluster command (http://docs.splunk.com/Documentation/Splunk/6.3.2/SearchReference/Cluster ). Try something like

... | cluster field=emailfield showcount=t | table cluster_count emailfield _raw | sort -cluster_count

You may have to play around with the value of the cluster threshold (read the docs!) The beauty of this is you won't have to worry about regex issues and you can easily see the most common matches.

Finally, I think you have a bigger problem if you're trying to validate your customer accounts with Splunk . I am sure there are lots of good shopping cart security libraries out there - I would tell your developers to fix their application first!

sundareshr · ‎12-29-2015

One option would be to extract the email "name", remove the 'dot' and dedupe or dc(name). Something like this

.... | rex field=emailfield "(?<name>[.*]+)@" | eval name=replace(name, "\.", "")  | stats dc(name)

The risk here is you could potentially lose valid email address that have similar name but different domains (s.r@gmail.com vs sr@yahoo.com). Hopefully, this gets you going.

woodcock · ‎12-29-2015

Define "dot trend pattern"; nobody understands what you mean!

kamaleshwar · ‎12-29-2015

Let me explain you the dot trend pattern.
For example if you see the offer in a retail store(website) - for new registration you will get like some discount coupons. So here the users are creating new accounts with multiple email IDs. Those IDs are have to be unique right so they're using dots in between their email IDs to create multiple using scripts or something.
sample@example.com
s.ample@example.com
sa.mple@example.com
sam.ple@example.com

Like this they are creating IDs and getting the discount coupons to purchase.

Please let me know if you need more info.

jplumsdaine22 · ‎12-29-2015

Yes I agree. Please clarify, showing examples where possible. The phrase "Email IDs with dot trend patterns" doesn't really show anything if I google it. (Well, it does now. It shows this splunk answer. You have now learnt a valuable lesson about SEO)

esix_splunk · ‎12-27-2015

Are you wanting to extract everything before the "@gmail.com"?

... | rex field=emailfield "(?<name>[^@]+)@gmail.com"

Or do something different...

kamaleshwar · ‎12-28-2015

Nothing like that, i'm looking for the dot trend patterns. not only gmail all the email ID which are having dot trend. Since i can get the email this is not an issue, but email with dot trend. Many users are creating the ID with script using the dot trend, so need to monitor that alone is very difficult. If anything there to do that so it could be very useful.

How to search for and extract Email IDs with dot trend patterns?

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!

Are you a member of the Splunk Community?

How to search for and extract Email IDs with dot trend patterns?

Splunk Observability as Code: From Zero to Dashboard

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Shape the Future of Splunk: Join the Product Research Lab!