Splunk Search

How to search for and extract Email IDs with dot trend patterns?

Explorer

I would like to know whether there is any possibility of extracting or getting the Email IDs with dot trend patterns.

(E.g) search.splunk@gmail.com
searchs.plunk@gmail.com
searchsp.lunk@gmail.com and so on...

Thanks in advance!

0 Karma

New Member

I am using a soft https://www.atompark.com/ for extract the available addresses.

0 Karma

Explorer

I've written the rex command to pull the emails. But actually as per my REX command it will pull only the email having one dot, i need its to pull the email with more than one dot also before the domain names. Here is my rex cmd. Help me on this. Can we add OR condition here? Is it would work in REX?

rex "^(?:[^:\n]*:){5}\s+(?P\w+\.\w+@\w+\.\w+)"
0 Karma

Esteemed Legend

It appears that you are unaware of google's implementation of infinite email aliases. For gmail (and possibly other email systems), the username cannot contain periods or plus-signs when you create your gmail ID. The reason for this rule is because those characters are how email aliases are allowed. Gmail strips out all periods and anything after a plus sign. See this blog for details:

http://gmailblog.blogspot.com/2008/03/2-hidden-ways-to-get-more-from-your.html

So for gmail, you need to normalize usernames the same way, like this:

... | rex field=email "^(?<username>[^@]*)@(?<domain>.*)$" | username = if(domain="gmail.com", replace(replace(username, "\.", ""), "+.*", "") , username)
0 Karma

Explorer

This one is okay, but i don't need it as i've written the rex command to pull the emails. But actually as per my REX command it will pull only the email having one dot, i need its to pull the email with more than one dot also before the domain names. Here is my rex cmd. Help me on this. Can we add OR condition here? Is it would work in REX?

rex "^(?:[^:\n]*:){5}\s+(?P<email_dot>\w+\.\w+@\w+\.\w+)"
0 Karma

Esteemed Legend

Ditch your rex and use mine (assuming that there is a field that contains the entire email address).

0 Karma

New Member

Hi there!

What about the following snippet?

<your_search>
| rex field=<email_field> "^(?P<email_user>.*)@(?P<email_domain>.*)$"
| rex field=email_user mode=sed "s/\.//g"
| rex field=email_user mode=sed "s/\+.*$//g"
| eval sanitized_email=email_user . "@" . email_domain
| table <email_field>, sanitized_email

Should compute a sanitized version of the email address in a new field named sanitized_email.

Regards,

0 Karma

Explorer

Thanks! Its Cool but its fetching the sanitized email from the dot trend emails.

E.x

Actual Email with Dot trends

h.au.517@gmail[.]com 
 h.au5.17@gmail[.]com
 h.au51.7@gmail[.]com 
ha.u.517@gmail[.]com 
ha.u51.7@gmail[.]com 
ha.u517@gmail[.]com 
ha.u5.17@gmail[.]com

Sanitized Email

hau517@gmail[.]com

Actually i need the dot trend alone not the sanitized one.

Anyway thanks for this one!

0 Karma

Influencer

First of all those gmail addresses you posted? They are all the same mailbox (see https://support.google.com/mail/answer/10313?hl=en).
You could split the email up as follows (including the domain as sundareshr pointed out)

| rex field=emailfield (?<namePartA>.*?)\.(?<namePartB>.*?)@(?<domain>.*)

You then will have three new field - namePartA (before the dot), namePartB and domain. You can arrange these as you wish to capture the patterns. (eg | stats values(namePartB) by Name PartA) but I'm not sure how helpful that is going to be.

Instead I would have a play with the cluster command (http://docs.splunk.com/Documentation/Splunk/6.3.2/SearchReference/Cluster ). Try something like

... | cluster field=emailfield showcount=t | table cluster_count emailfield _raw | sort -cluster_count

You may have to play around with the value of the cluster threshold (read the docs!) The beauty of this is you won't have to worry about regex issues and you can easily see the most common matches.

Finally, I think you have a bigger problem if you're trying to validate your customer accounts with Splunk . I am sure there are lots of good shopping cart security libraries out there - I would tell your developers to fix their application first!

0 Karma

Legend

One option would be to extract the email "name", remove the 'dot' and dedupe or dc(name). Something like this

.... | rex field=emailfield "(?<name>[.*]+)@" | eval name=replace(name, "\.", "")  | stats dc(name)

The risk here is you could potentially lose valid email address that have similar name but different domains (s.r@gmail.com vs sr@yahoo.com). Hopefully, this gets you going.

0 Karma

Esteemed Legend

Define "dot trend pattern"; nobody understands what you mean!

0 Karma

Explorer

Let me explain you the dot trend pattern.
For example if you see the offer in a retail store(website) - for new registration you will get like some discount coupons. So here the users are creating new accounts with multiple email IDs. Those IDs are have to be unique right so they're using dots in between their email IDs to create multiple using scripts or something.
sample@example.com
s.ample@example.com
sa.mple@example.com
sam.ple@example.com

Like this they are creating IDs and getting the discount coupons to purchase.

Please let me know if you need more info.

0 Karma

Influencer

Yes I agree. Please clarify, showing examples where possible. The phrase "Email IDs with dot trend patterns" doesn't really show anything if I google it. (Well, it does now. It shows this splunk answer. You have now learnt a valuable lesson about SEO)

0 Karma

Splunk Employee
Splunk Employee

Are you wanting to extract everything before the "@gmail.com"?

... | rex field=emailfield "(?<name>[^@]+)@gmail.com"

Or do something different...

0 Karma

Explorer

Nothing like that, i'm looking for the dot trend patterns. not only gmail all the email ID which are having dot trend. Since i can get the email this is not an issue, but email with dot trend. Many users are creating the ID with script using the dot trend, so need to monitor that alone is very difficult. If anything there to do that so it could be very useful.

0 Karma