Splunk Search

Extract url up to question mark

johnansett
Communicator

Hello,
I am trying to extract the entire URL up to the point where it includes a question mark. Generally the data will look like this:

{"cf_app_id":"6304b330-c026-4ea2-a6cf-41226d5357ad","cf_app_name":"app","cf_ignored_app":false,"cf_org_id":"ff8d1329-74e1-4d13-852f-5cea389de951","cf_org_name":"apporg","cf_origin":"firehose","cf_space_id":"79a0055d-36ba-4051-b3ea-825023d617b2","cf_space_name":"prod-web","deployment":"p-isolation-segment-dbd885e4d164ead74648","event_type":"LogMessage","ip":"192.168.1.1","job":"isolated_router","job_index":"6cbb1296-8dac-4f14-859e-63292ea984e8","message_type":"OUT","msg":"app.web.state.bizsunit.company.com - [2019-07-09T03:38:28.088+0000] \"POST /api/contact/contactQuestions HTTP/1.1\" 200 52 3861 \"https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134\" \"192.168.1.1:35268\" \"192.168.9.12:1010\" x_forwarded_for:\"192.168.8.1, 192.168.8.11\" x_forwarded_proto:\"https\" vcap_request_id:\"af7f0c6f-eff3-48c4-5f33-2e50c81e1104\" response_time:0.650701403 app_id:\"6304b330-c026-4ea2-a6cf-41226d5357ad\" app_index:\"1\" x_request_id:\"6bdccae0-300f-4acf-9772-4264b18b7db4\" x_b3_traceid:\"20fc67710967dfb2\" x_b3_spanid:\"20fc67710967dfb2\" x_b3_parentspanid:\"-\"\n","origin":"gorouter","source_instance":"1","source_type":"RTR","timestamp":1562643508739740164}

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250

I want to extract these as the URL:

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact

I tried this

(?<msg_uri>https:\/\/[^?]+)

Which seems to match regexr but not correctly with Splunk

Any help is appreciated!

0 Karma

woodcock
Esteemed Legend

Try this:

https?:(<url>\/\/?[^\s\?]+)
0 Karma

tiagofbmm
Influencer

Maybe you're missing escaping the ?

| rex field=_raw "(?<url>https[^\?]*)"

0 Karma
Get Updates on the Splunk Community!

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization uses ...