Hi All - I am having trouble extracting the following fields from a GET request .
GET **/TSGene/**images/literature.jpg
I tried the following but it did not seem to work \bGET\s+\K\S+(\/[\/[:word:]\-\.\=\&\?]+)\s
I just want to extract the part highlighted above. Thanks in advance!
Thanks,
Deepthi
This should get you what you want:
| rex "\"GET (?P<url>\/.*?[\/ ])" | eval url=trim(url)
This will match in the case of an additional / and in the case where there isn't a second /. If there is no / then there will be a trailing space in the url so I added a trim to remove it. A fancier regex could probably remove the need for the trim but this works.
I'm a little confused about what you want to do with POSTs. In your example above, you still parsed POSTs but maybe that was just an oversight. I would suggest filtering them out so you are only processing events with ""GET " in the event. If you don't filter them out then the "url" field will be NULL since the regex will not match.
Try this.. Your fieldname will be GET
| rex (?<GET>GET\s\S+\.jpg)
Sorry if I wasn't clear I only want the following parts extracted. The data between the first slashes / after GET which should include the slashes / .
Extracted data -
/TSGene/
/TSGene/
/favicon.ico
/TSGene/
/static/
/static/
/orl/
Actual requests -
"HTTPS","","POST /TSGene/search_result.cgi HTTP/1.1\r\n
"HTTPS","gene=5781","GET /TSGene/gene_general.cgi?gene=5781 HTTP/1.1\r\n
"HTTPS","","GET /favicon.ico HTTP/1.1\r\n
"HTTPS","","POST /TSGene/search_result.cgi HTTP/1.1\r\n
"HTTPS","ver=20142803","GET /static/wp-content/plugins/fruitful-shortcodes/includes/shortcodes/js/tabs/easyResponsiveTabs.js?ver=20142803 HTTP/1.1\r\n
"HTTPS","ver=1.11.4","GET /static/wp-includes/js/jquery/ui/slider.min.js?ver=1.11.4 HTTP/1.1\r\n
"HTTPS","","GET /orl/wp-content/themes/utms-orl/images/common/prefooter-bg.jpg HTTP/1.1\r\n
Try this one:
| rex field=_raw "(?<=POST|GET)\s(<?yourfield>\/[^\/]*)"
Thanks that works better but in some cases it picks up the HTTP that follows the requests.
Can this be modified to extract like this ?
"HTTPS","","GET /favicon.ico HTTP/1.1\r\n -> /favicon.ico should only be extracted.
At this time, it extracts the following -> - /favicon.ico HTTP
Thanks in advance !
Yes just use the space in the rex too
| rex field=_raw "(?<=POST|GET)\s(<?yourfield>\/[^\/|\s]*)"
Can you please paste a full example of the GET request?
Sure - some more samples of GET and POST
"HTTPS","","POST /TSGene/search_result.cgi HTTP/1.1\r\n
"HTTPS","gene=5781","GET /TSGene/gene_general.cgi?gene=5781 HTTP/1.1\r\n
"HTTPS","","GET /favicon.ico HTTP/1.1\r\n
"HTTPS","","POST /TSGene/search_result.cgi HTTP/1.1\r\n
"HTTPS","ver=20142803","GET /static/wp-content/plugins/fruitful-shortcodes/includes/shortcodes/js/tabs/easyResponsiveTabs.js?ver=20142803 HTTP/1.1\r\n
"HTTPS","ver=1.11.4","GET /static/wp-includes/js/jquery/ui/slider.min.js?ver=1.11.4 HTTP/1.1\r\n
"HTTPS","","GET /orl/wp-content/themes/utms-orl/images/common/prefooter-bg.jpg HTTP/1.1\r\n
some logs have version number in between