What would be the correct expression to extract only the email address that follows "email="? I then want to call that field "email_id".
1510591529.811934 IP xx.xxx.xxx.xxx.80 > xxx.xxx.xxx.xxx.49819: Flags [P.], seq 1:393, ack 578, win 30, options [nop,nop,TS val 2082754724 ecr 1683330855], length 392: HTTP: HTTP/1.1 302 Found
E.....@.4..AQ....Cj..P........r............
|$P.dU.'HTTP/1.1 302 Found
Date: Mon, 13 Nov 2017 16:45:29 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.14
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /login.php?err=1&email=xxxxxxx%40xxxxxxxxx.xxx
Content-Length: 0
Connection: close
Content-Type: text/html
you could do the following with an inline regex extraction in your search:
index=x sourcetype=y | rex field=_raw "email=(?<email_id>\S+)"
And if you wanted to create a search time field extraction so that you don't need to extract the field with rex each time you run the search you could do the following:
example:
On Search Head:
$SPLUNK_HOME/etc/system/local/props.conf
[youreventsourcetype]
EXTRACT-email = email=(?<email_id>\S+)
restart splunk on the SH
$SPLUNK_HOME/bin
./splunk restart
@cyberhumint, have can you try the following. In Splunk by default rex matches pattern only in single line so it would end pattern matching on new line character.
<YourBaseSearch>
| rex "email=(?<email_id>.*)"
Please try out and confirm. You can use regex101.com to test the regex with your sample data.
Excellent! Thank you very helpful however it is now returning the following and that is my fault for not including in original post.
Some of the raw data also looks something like this:
email=xxxxxxxxxxxxxxxxxxx&firstName=xxxxxxxxxxxxxxx&zipCode=&subscription
The issue is now as I run the expression you provided it also includes everything that follow "&" in this case firstName=xxxxxxxxxxxxxxxxx etc...
How do I extract the email address only up to the first "&" and nothing more?
Thank you so much for helping me with this it is truly appreciated!
I had suggested .* based on the fact that you wanted to extract everything. Regular expression is very much depended on patterns and in this case you need your regex match to end when there is first &
encountered after the email. So try the following:
<YourBaseSearch>
| rex "email=(?<email_id>[^\&]+)\&"
Do test out regular expression on regex101.com which will also explain how regular expression performed pattern matching.
Updated, missed a + sign to repeat the pattern until &
is found for the first time. Please try out this one instead.
Thank you, but the above expression does not return the value of email= to email_id field.
Your original expression worked great!
Is there an expression like your first suggestion of
| rex "email=(?.*)" that I can wildcard everything after the "&"?
In some cases the raw data is email=xxxxxxxxxxxxxxxxxxx&firstName=xxxxxxxxxxxxxxx&zipCode=⊂scription
or
email=xxxxxxxxxxxxxxxxxxx&phone=xxxxxxxxxxxxxxx&
or
email=xxxxxxxxxxxxxxxxxxx&submitform=xxxxxxxxxxxxxxx& etc...
I had missed +
sign. I have updated the reg-ex. On similar lines you can use the following:
| rex "email=(?<email_id>[^&]+)&"
| rex "firstName=(?<firstName>[^&]+)&"
| rex "phone=(?<phone>[^&]+)&"
....
....
However, like I mentioned before regular expression is essentially pattern matching hence we would required various sample events to come up with exact start and end pattern for various fields to be extracted. You can mock or anonymize data which is sensitive
email=testemail@abc.com&firstName=blahblah&
Also, are all fields that you want to extract always present in the event or is it one or the other. In case they are not always present various types of event sample is also required.
If your raw events have these Key Value pairs, you can directly pipe to KV
command to extract these
<YourBaseSearch>
| KV
Or else try the extract command with KV delimiter as =
and pair delimiter as &
<YourBaseSearch>
| extract pairdelim="&", kvdelim="="
Please try out and confirm.
Just curious have you run your base query to show raw events and time in Verbose mode
? If these field names are not being displayed as Interesting fields automatically, then it implies you have either set the KV_MODE=none
or changed from auto
to something else in your props.conf
file.
Following are various settings (refer to Splunk docs: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf#Field_extraction_configuration)
KV_MODE = [none|auto|auto_escaped|multi|json|xml]
you can anchor the capture group at the end like:
email=(?.+)&firstName=
index=x sourcetype=y | rex field=_raw "email=(?<email_id>.+)&firstName="
and for the search time extraction
[sourcetype]
EXTRACT-email = email=(?<email_id>.+)&firstName=
Thank you so much!!!
Only wish I would have tried the community here sooner.
Thanks again all.
you could do the following with an inline regex extraction in your search:
index=x sourcetype=y | rex field=_raw "email=(?<email_id>\S+)"
And if you wanted to create a search time field extraction so that you don't need to extract the field with rex each time you run the search you could do the following:
example:
On Search Head:
$SPLUNK_HOME/etc/system/local/props.conf
[youreventsourcetype]
EXTRACT-email = email=(?<email_id>\S+)
restart splunk on the SH
$SPLUNK_HOME/bin
./splunk restart