Splunk Search

Extract url up to question mark

johnansett
Communicator

Hello,
I am trying to extract the entire URL up to the point where it includes a question mark. Generally the data will look like this:

{"cf_app_id":"6304b330-c026-4ea2-a6cf-41226d5357ad","cf_app_name":"app","cf_ignored_app":false,"cf_org_id":"ff8d1329-74e1-4d13-852f-5cea389de951","cf_org_name":"apporg","cf_origin":"firehose","cf_space_id":"79a0055d-36ba-4051-b3ea-825023d617b2","cf_space_name":"prod-web","deployment":"p-isolation-segment-dbd885e4d164ead74648","event_type":"LogMessage","ip":"192.168.1.1","job":"isolated_router","job_index":"6cbb1296-8dac-4f14-859e-63292ea984e8","message_type":"OUT","msg":"app.web.state.bizsunit.company.com - [2019-07-09T03:38:28.088+0000] \"POST /api/contact/contactQuestions HTTP/1.1\" 200 52 3861 \"https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134\" \"192.168.1.1:35268\" \"192.168.9.12:1010\" x_forwarded_for:\"192.168.8.1, 192.168.8.11\" x_forwarded_proto:\"https\" vcap_request_id:\"af7f0c6f-eff3-48c4-5f33-2e50c81e1104\" response_time:0.650701403 app_id:\"6304b330-c026-4ea2-a6cf-41226d5357ad\" app_index:\"1\" x_request_id:\"6bdccae0-300f-4acf-9772-4264b18b7db4\" x_b3_traceid:\"20fc67710967dfb2\" x_b3_spanid:\"20fc67710967dfb2\" x_b3_parentspanid:\"-\"\n","origin":"gorouter","source_instance":"1","source_type":"RTR","timestamp":1562643508739740164}

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250

I want to extract these as the URL:

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact

I tried this

(?<msg_uri>https:\/\/[^?]+)

Which seems to match regexr but not correctly with Splunk

Any help is appreciated!

0 Karma

woodcock
Esteemed Legend

Try this:

https?:(<url>\/\/?[^\s\?]+)
0 Karma

tiagofbmm
Influencer

Maybe you're missing escaping the ?

| rex field=_raw "(?<url>https[^\?]*)"

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...