Are you wanting to extract everything before the "@gmail.com"?
... | rex field=emailfield "(?<name>[^@]+)@gmail.com"
Or do something different...
Nothing like that, i'm looking for the dot trend patterns. not only gmail all the email ID which are having dot trend. Since i can get the email this is not an issue, but email with dot trend. Many users are creating the ID with script using the dot trend, so need to monitor that alone is very difficult. If anything there to do that so it could be very useful.
Yes I agree. Please clarify, showing examples where possible. The phrase "Email IDs with dot trend patterns" doesn't really show anything if I google it. (Well, it does now. It shows this splunk answer. You have now learnt a valuable lesson about SEO)
Let me explain you the dot trend pattern.
For example if you see the offer in a retail store(website) - for new registration you will get like some discount coupons. So here the users are creating new accounts with multiple email IDs. Those IDs are have to be unique right so they're using dots in between their email IDs to create multiple using scripts or something.
Like this they are creating IDs and getting the discount coupons to purchase.
Please let me know if you need more info.
One option would be to extract the email "name", remove the 'dot' and dedupe or dc(name). Something like this
.... | rex field=emailfield "(?<name>[.*]+)@" | eval name=replace(name, "\.", "") | stats dc(name)
First of all those gmail addresses you posted? They are all the same mailbox (see https://support.google.com/mail/answer/10313?hl=en).
You could split the email up as follows (including the domain as sundareshr pointed out)
| rex field=emailfield (?<namePartA>.*?)\.(?<namePartB>.*?)@(?<domain>.*)
You then will have three new field - namePartA (before the dot), namePartB and domain. You can arrange these as you wish to capture the patterns. (eg | stats values(namePartB) by Name PartA) but I'm not sure how helpful that is going to be.
Instead I would have a play with the cluster command (http://docs.splunk.com/Documentation/Splunk/6.3.2/SearchReference/Cluster ). Try something like
... | cluster field=emailfield showcount=t | table cluster_count emailfield _raw | sort -cluster_count
You may have to play around with the value of the cluster threshold (read the docs!) The beauty of this is you won't have to worry about regex issues and you can easily see the most common matches.
Finally, I think you have a bigger problem if you're trying to validate your customer accounts with Splunk . I am sure there are lots of good shopping cart security libraries out there - I would tell your developers to fix their application first!
What about the following snippet?
<your_search> | rex field=<email_field> "^(?P<email_user>.*)@(?P<email_domain>.*)$" | rex field=email_user mode=sed "s/\.//g" | rex field=email_user mode=sed "s/\+.*$//g" | eval sanitized_email=email_user . "@" . email_domain | table <email_field>, sanitized_email
Should compute a sanitized version of the email address in a new field named
Thanks! Its Cool but its fetching the sanitized email from the dot trend emails.
Actual Email with Dot trends
h.au.517@gmail[.]com h.au5.17@gmail[.]com h.au51.7@gmail[.]com ha.u.517@gmail[.]com ha.u51.7@gmail[.]com ha.u517@gmail[.]com ha.u5.17@gmail[.]com
Actually i need the dot trend alone not the sanitized one.
Anyway thanks for this one!