Splunk Search

Regex for URL parsing

ChhayaV
Communicator

Hi,

I want to extract url's from the events as a seperate field.

Here is the log file

04/15/2013 17:51:58.09  w3wp.exe (0x113C)                           0x3D50  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/GEOMETRIC/SitePages/MyEnrollment.aspx))
04/15/2013 17:51:58.26  w3wp.exe (0x113C)                           0x4AA0  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/PublicSite/images/header.jpg)) 
04/15/2013 17:59:25.20  w3wp.exe (0x113C)                           0x14B0  SharePoint Foundation           Monitoring                      nasq    Medium      Entering monitored scope (Request (GET:https://www.abc.co.in:443/_LAYOUTS/ClientPortal/SilverlightWebParts/PROD/MyBenefits.xap?ver=5.19))

Here i just want to extract the url's ends with .aspx and .xap pages like
https://www.abc.co.in:443/GEOMETRIC/SitePages/MyEnrollment.aspx
https://www.abc.co.in:443/_LAYOUTS/ClientPortal/SilverlightWebParts/PROD/MyBenefits.xap?ver=5.19

If i write regex as (?i)\(GET:(?P< FIELDNAME>[^\?]+) ,the url is not being extracted properly.

Please help with the regex.

Tags (1)
0 Karma
1 Solution

MHibbin
Influencer

Not sure your second example is an aspx file, but I'm not web developer. However the following regex will capture those that end in ".aspx"...

"GET:\w+://(?P<url>[^\)]+\.aspx)"

You can try out regular expressions on the following site... handy tool:

http://gskinner.com/RegExr/

Hope this helps.

View solution in original post

ChhayaV
Communicator

hi,
i want to restrict my regex to first match only

Leaving Monitored Scope (Request (GET:https://www.abc/_layouts/ClientPortal/abc/CustomPages/LoginPage.aspx?ReturnUrl=%2f_layouts%2fAuthent...). Execution Time=17.1800154751023
if this is my log entry then i should get only "LoginPage.aspx" but the result is "LoginPage.aspx?ReturnUrl=%2f_layouts%2fAuthenticate.aspx"

0 Karma

burkmat
Engager

All current answers rely on the HTTP request being a GET-request. HTTP has several types (GET/POST/HEAD being most common), and if you want all URLs to be captured, you need to take this into consideration.

The following regex would probably be a better choice to catch all HTTP methods, and all URLs regardless of weird formats (assuming no GET-parameters are appended to the URL - if so you need to take them into consideration).

(?i)\(Request \([A-Z]+:(?<fieldname>.*\.(aspx|xap))\)\)$

Ayn
Legend

The regex should cover that. It does not cover parameters though, like burkmat said.

0 Karma

ChhayaV
Communicator

Hi,
Its working But how can i extract word.aspx and word.word.word.xap or word.xap all other possible combinations of word and (.)

0 Karma

MHibbin
Influencer

Not sure your second example is an aspx file, but I'm not web developer. However the following regex will capture those that end in ".aspx"...

"GET:\w+://(?P<url>[^\)]+\.aspx)"

You can try out regular expressions on the following site... handy tool:

http://gskinner.com/RegExr/

Hope this helps.

ChhayaV
Communicator

Hi,
Its working But how can i extract word.aspx and word.word.word.xap or word.xap all other possible combinations of word and (.)

0 Karma

kristian_kolb
Ultra Champion

should work;

rex "\(GET:(?<fieldname>[^\)]+\.(xap|aspx))"

Get Updates on the Splunk Community!

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...