Splunk Search

Regex for URL parsing

ChhayaV
Communicator

Hi,

I want to extract url's from the events as a seperate field.

Here is the log file

04/15/2013 17:51:58.09  w3wp.exe (0x113C)                           0x3D50  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/GEOMETRIC/SitePages/MyEnrollment.aspx))
04/15/2013 17:51:58.26  w3wp.exe (0x113C)                           0x4AA0  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/PublicSite/images/header.jpg)) 
04/15/2013 17:59:25.20  w3wp.exe (0x113C)                           0x14B0  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/_LAYOUTS/ClientPortal/SilverlightWebParts/PROD/MyBenefits.xap?ver=5.19))

Here i just want to extract the url's ends with .aspx and .xap pages like
https://www.abc.co.in:443/GEOMETRIC/SitePages/MyEnrollment.aspx
https://www.abc.co.in:443/_LAYOUTS/ClientPortal/SilverlightWebParts/PROD/MyBenefits.xap?ver=5.19

If i write regex as (?i)\(GET:(?P< FIELDNAME>[^\?]+) ,the url is not being extracted properly.

Please help with the regex.

Tags (1)
0 Karma
1 Solution

MHibbin
Influencer

Not sure your second example is an aspx file, but I'm not web developer. However the following regex will capture those that end in ".aspx"...

"GET:\w+://(?P<url>[^\)]+\.aspx)"

You can try out regular expressions on the following site... handy tool:

http://gskinner.com/RegExr/

Hope this helps.

View solution in original post

ChhayaV
Communicator

hi,
i want to restrict my regex to first match only

Leaving Monitored Scope (Request (GET:https://www.abc/_layouts/ClientPortal/abc/CustomPages/LoginPage.aspx?ReturnUrl=%2f_layouts%2fAuthent...). Execution Time=17.1800154751023
if this is my log entry then i should get only "LoginPage.aspx" but the result is "LoginPage.aspx?ReturnUrl=%2f_layouts%2fAuthenticate.aspx"

0 Karma

burkmat
Engager

All current answers rely on the HTTP request being a GET-request. HTTP has several types (GET/POST/HEAD being most common), and if you want all URLs to be captured, you need to take this into consideration.

The following regex would probably be a better choice to catch all HTTP methods, and all URLs regardless of weird formats (assuming no GET-parameters are appended to the URL - if so you need to take them into consideration).

(?i)\(Request \([A-Z]+:(?<fieldname>.*\.(aspx|xap))\)\)$

Ayn
Legend

The regex should cover that. It does not cover parameters though, like burkmat said.

0 Karma

ChhayaV
Communicator

Hi,
Its working But how can i extract word.aspx and word.word.word.xap or word.xap all other possible combinations of word and (.)

0 Karma

MHibbin
Influencer

Not sure your second example is an aspx file, but I'm not web developer. However the following regex will capture those that end in ".aspx"...

"GET:\w+://(?P<url>[^\)]+\.aspx)"

You can try out regular expressions on the following site... handy tool:

http://gskinner.com/RegExr/

Hope this helps.

ChhayaV
Communicator

Hi,
Its working But how can i extract word.aspx and word.word.word.xap or word.xap all other possible combinations of word and (.)

0 Karma

kristian_kolb
Ultra Champion

should work;

rex "\(GET:(?<fieldname>[^\)]+\.(xap|aspx))"

Get Updates on the Splunk Community!

Detecting Brute Force Account Takeover Fraud with Splunk

This article is the second in a three-part series exploring advanced fraud detection techniques using Splunk. ...

Buttercup Games: Further Dashboarding Techniques (Part 9)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...

Buttercup Games: Further Dashboarding Techniques (Part 8)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...