Splunk Search

Head scratcher regex

dbcase
Motivator

Hi

I have the below data and need to extract three things, 2 of which are pretty easy (method (GET or POST) and responseStatus (numeric value), those I can do). The one that I'm having trouble with is extracting the last segment of the URL. For example in the first line I'd like to extract obtainToken. On the second line address. The third line roost and the fourth line getAllLightingStatus. I've tried several variations to partial success but not all success.

{"method":"POST","url":"/rest/blah/sites/1004057/obtainToken?tokenType=WS_CVR","params":{},"requestStartTime":1516982978230,"responseStatus":200,"responseStatusText":"OK","success":true,"responseTime":1516982979338} 

{"method":"GET","url":"/rest/blah/sites/1004057/address","params":{},"requestStartTime":1516982978142,"responseStatus":200,"responseStatusText":"OK","success":true,"responseTime":1516982978901}   

{"method":"POST","url":"/rest/blah/sites/1004057/cloudIntegrations/roost","params":{"method":"POST","path":"/iCtrlGetDeviceStatus"},"requestStartTime":1516982978032,"responseStatus":200,"responseStatusText":"OK","success":true,"responseTime":1516982979118}    

{"method":"GET","url":"/rest/blah/sites/1004057/network/lights/getAllLightingStatus","params":{},"requestStartTime":1516982978146,"responseStatus":500,"responseStatusText":"Internal Server Error","success":false,"responseTime":1516982978914,"data":"Device not connected to server"}
Tags (2)
0 Karma
1 Solution

elliotproebstel
Champion

How about this for getting the final_segment of the url:
"url":"[^"]+\/(?<final_segment>\w+)(\?[^\/"]+)?"

View solution in original post

mbenwell
Communicator

Quick question, any reason you're writing regex for the other objects in that data? It looks like a json object.

With the correct sourcetype the url field should exist already, then you could use a transform like the below (Hoping for feedback on performance from @cpetterborg on the below being the master of regex 🙂 😞

[extract_page_from_url]
SOURCE_KEY = url
REGEX = \S+\/([^\/\?]+)
FORMAT = page::$1

Also, check out the URL toolbox app (https://splunkbase.splunk.com/app/2734/), very handy when working with URL's

0 Karma

elliotproebstel
Champion

How about this for getting the final_segment of the url:
"url":"[^"]+\/(?<final_segment>\w+)(\?[^\/"]+)?"

dbcase
Motivator

Wow, I feel like a babe in the woods! Thanks guys!!!

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Just to account for potentially other characters besides \w in the field and potentially a blank field:

"url":"[^"]+\/(?<final_segment>[^"?]*)(\?[^\/"]+)?"

It's also about 20% more efficient.

elliotproebstel
Champion

Thanks, guru! I appreciate the correction.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...