Splunk Search

Extract url up to question mark

johnansett
Communicator

Hello,
I am trying to extract the entire URL up to the point where it includes a question mark. Generally the data will look like this:

{"cf_app_id":"6304b330-c026-4ea2-a6cf-41226d5357ad","cf_app_name":"app","cf_ignored_app":false,"cf_org_id":"ff8d1329-74e1-4d13-852f-5cea389de951","cf_org_name":"apporg","cf_origin":"firehose","cf_space_id":"79a0055d-36ba-4051-b3ea-825023d617b2","cf_space_name":"prod-web","deployment":"p-isolation-segment-dbd885e4d164ead74648","event_type":"LogMessage","ip":"192.168.1.1","job":"isolated_router","job_index":"6cbb1296-8dac-4f14-859e-63292ea984e8","message_type":"OUT","msg":"app.web.state.bizsunit.company.com - [2019-07-09T03:38:28.088+0000] \"POST /api/contact/contactQuestions HTTP/1.1\" 200 52 3861 \"https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134\" \"192.168.1.1:35268\" \"192.168.9.12:1010\" x_forwarded_for:\"192.168.8.1, 192.168.8.11\" x_forwarded_proto:\"https\" vcap_request_id:\"af7f0c6f-eff3-48c4-5f33-2e50c81e1104\" response_time:0.650701403 app_id:\"6304b330-c026-4ea2-a6cf-41226d5357ad\" app_index:\"1\" x_request_id:\"6bdccae0-300f-4acf-9772-4264b18b7db4\" x_b3_traceid:\"20fc67710967dfb2\" x_b3_spanid:\"20fc67710967dfb2\" x_b3_parentspanid:\"-\"\n","origin":"gorouter","source_instance":"1","source_type":"RTR","timestamp":1562643508739740164}

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250

I want to extract these as the URL:

https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact

I tried this

(?<msg_uri>https:\/\/[^?]+)

Which seems to match regexr but not correctly with Splunk

Any help is appreciated!

0 Karma

woodcock
Esteemed Legend

Try this:

https?:(<url>\/\/?[^\s\?]+)
0 Karma

tiagofbmm
Influencer

Maybe you're missing escaping the ?

| rex field=_raw "(?<url>https[^\?]*)"

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...