Splunk Search

How can I extract this email address field and name it "email_id"?

cyberhumint
New Member

What would be the correct expression to extract only the email address that follows "email="? I then want to call that field "email_id".

1510591529.811934 IP xx.xxx.xxx.xxx.80 > xxx.xxx.xxx.xxx.49819: Flags [P.], seq 1:393, ack 578, win 30, options [nop,nop,TS val 2082754724 ecr 1683330855], length 392: HTTP: HTTP/1.1 302 Found
E.....@.4..AQ....Cj..P........r............
|$P.dU.'HTTP/1.1 302 Found
Date: Mon, 13 Nov 2017 16:45:29 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.14
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /login.php?err=1&email=xxxxxxx%40xxxxxxxxx.xxx
Content-Length: 0
Connection: close
Content-Type: text/html

0 Karma
1 Solution

rphillips_splk
Splunk Employee
Splunk Employee

you could do the following with an inline regex extraction in your search:

index=x sourcetype=y | rex field=_raw "email=(?<email_id>\S+)"

And if you wanted to create a search time field extraction so that you don't need to extract the field with rex each time you run the search you could do the following:

  1. Determine the sourcetype of the event
  2. Build a field extraction applied to this sourcetype on your search head in props.conf

example:

On Search Head:
$SPLUNK_HOME/etc/system/local/props.conf

[youreventsourcetype]
EXTRACT-email = email=(?<email_id>\S+)

restart splunk on the SH

$SPLUNK_HOME/bin
./splunk restart

View solution in original post

0 Karma

niketn
Legend

@cyberhumint, have can you try the following. In Splunk by default rex matches pattern only in single line so it would end pattern matching on new line character.

<YourBaseSearch>
| rex "email=(?<email_id>.*)"

Please try out and confirm. You can use regex101.com to test the regex with your sample data.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

cyberhumint
New Member

Excellent! Thank you very helpful however it is now returning the following and that is my fault for not including in original post.

Some of the raw data also looks something like this:
email=xxxxxxxxxxxxxxxxxxx&firstName=xxxxxxxxxxxxxxx&zipCode=&subscription

The issue is now as I run the expression you provided it also includes everything that follow "&" in this case firstName=xxxxxxxxxxxxxxxxx etc...

How do I extract the email address only up to the first "&" and nothing more?

Thank you so much for helping me with this it is truly appreciated!

0 Karma

niketn
Legend

I had suggested .* based on the fact that you wanted to extract everything. Regular expression is very much depended on patterns and in this case you need your regex match to end when there is first & encountered after the email. So try the following:

<YourBaseSearch>
 | rex "email=(?<email_id>[^\&]+)\&"

Do test out regular expression on regex101.com which will also explain how regular expression performed pattern matching.


Updated, missed a + sign to repeat the pattern until & is found for the first time. Please try out this one instead.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

cyberhumint
New Member

Thank you, but the above expression does not return the value of email= to email_id field.

Your original expression worked great!

Is there an expression like your first suggestion of

| rex "email=(?.*)" that I can wildcard everything after the "&"?

In some cases the raw data is email=xxxxxxxxxxxxxxxxxxx&firstName=xxxxxxxxxxxxxxx&zipCode=⊂scription
or
email=xxxxxxxxxxxxxxxxxxx&phone=xxxxxxxxxxxxxxx&
or
email=xxxxxxxxxxxxxxxxxxx&submitform=xxxxxxxxxxxxxxx& etc...

0 Karma

niketn
Legend

I had missed + sign. I have updated the reg-ex. On similar lines you can use the following:

|  rex "email=(?<email_id>[^&]+)&"
|  rex "firstName=(?<firstName>[^&]+)&"
|  rex "phone=(?<phone>[^&]+)&"

....
....

However, like I mentioned before regular expression is essentially pattern matching hence we would required various sample events to come up with exact start and end pattern for various fields to be extracted. You can mock or anonymize data which is sensitive

email=testemail@abc.com&firstName=blahblah&
Also, are all fields that you want to extract always present in the event or is it one or the other. In case they are not always present various types of event sample is also required.

If your raw events have these Key Value pairs, you can directly pipe to KV command to extract these

<YourBaseSearch>
| KV

Or else try the extract command with KV delimiter as = and pair delimiter as &

<YourBaseSearch>
| extract pairdelim="&", kvdelim="="

Please try out and confirm.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

niketn
Legend

Just curious have you run your base query to show raw events and time in Verbose mode? If these field names are not being displayed as Interesting fields automatically, then it implies you have either set the KV_MODE=none or changed from auto to something else in your props.conf file.
Following are various settings (refer to Splunk docs: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf#Field_extraction_configuration)

KV_MODE = [none|auto|auto_escaped|multi|json|xml]
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

you can anchor the capture group at the end like:

email=(?.+)&firstName=

index=x sourcetype=y | rex field=_raw "email=(?<email_id>.+)&firstName="

and for the search time extraction
[sourcetype]
EXTRACT-email = email=(?<email_id>.+)&firstName=

0 Karma

cyberhumint
New Member

Thank you so much!!!

Only wish I would have tried the community here sooner.

Thanks again all.

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

you could do the following with an inline regex extraction in your search:

index=x sourcetype=y | rex field=_raw "email=(?<email_id>\S+)"

And if you wanted to create a search time field extraction so that you don't need to extract the field with rex each time you run the search you could do the following:

  1. Determine the sourcetype of the event
  2. Build a field extraction applied to this sourcetype on your search head in props.conf

example:

On Search Head:
$SPLUNK_HOME/etc/system/local/props.conf

[youreventsourcetype]
EXTRACT-email = email=(?<email_id>\S+)

restart splunk on the SH

$SPLUNK_HOME/bin
./splunk restart

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...