Splunk Search

Extract url up to question mark

johnansett
Communicator

Hello,
I am trying to extract the entire URL up to the point where it includes a question mark. Generally the data will look like this:

{"cf_app_id":"6304b330-c026-4ea2-a6cf-41226d5357ad","cf_app_name":"app","cf_ignored_app":false,"cf_org_id":"ff8d1329-74e1-4d13-852f-5cea389de951","cf_org_name":"apporg","cf_origin":"firehose","cf_space_id":"79a0055d-36ba-4051-b3ea-825023d617b2","cf_space_name":"prod-web","deployment":"p-isolation-segment-dbd885e4d164ead74648","event_type":"LogMessage","ip":"192.168.1.1","job":"isolated_router","job_index":"6cbb1296-8dac-4f14-859e-63292ea984e8","message_type":"OUT","msg":"app.web.state.bizsunit.company.com - [2019-07-09T03:38:28.088+0000] \"POST /api/contact/contactQuestions HTTP/1.1\" 200 52 3861 \"https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134\" \"192.168.1.1:35268\" \"192.168.9.12:1010\" x_forwarded_for:\"192.168.8.1, 192.168.8.11\" x_forwarded_proto:\"https\" vcap_request_id:\"af7f0c6f-eff3-48c4-5f33-2e50c81e1104\" response_time:0.650701403 app_id:\"6304b330-c026-4ea2-a6cf-41226d5357ad\" app_index:\"1\" x_request_id:\"6bdccae0-300f-4acf-9772-4264b18b7db4\" x_b3_traceid:\"20fc67710967dfb2\" x_b3_spanid:\"20fc67710967dfb2\" x_b3_parentspanid:\"-\"\n","origin":"gorouter","source_instance":"1","source_type":"RTR","timestamp":1562643508739740164}

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250

I want to extract these as the URL:

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact

I tried this

(?<msg_uri>https:\/\/[^?]+)

Which seems to match regexr but not correctly with Splunk

Any help is appreciated!

0 Karma

woodcock
Esteemed Legend

Try this:

https?:(<url>\/\/?[^\s\?]+)
0 Karma

tiagofbmm
Influencer

Maybe you're missing escaping the ?

| rex field=_raw "(?<url>https[^\?]*)"

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...