Splunk Search

What is wrong with this regular expression to extract the URL from our logs?

harrisoncs
Explorer

I am attempting to extract the URL from our webfilter logs. The automatic field extraction process did not work. I now have a partially working expression and can't seem to find the reason it's not working. See below:

(?(https|http|ftp)://[a-zA-Z0-9.\-_]+/[a-zA-Z0-9+&@#/%=~_\-|!:,.;]*)

This command is only returning a couple of http URLs. It is not getting any https even though preview shows plenty of possibilities. Is there something simple I'm missing? One iteration only had https in the expression, however, it returned no results. The sample data below as it stands now, would not return results, as it is https.

Sample data (IPs have been changed)

"May 12 15:30:26 10.10.10.10 May 12 19:30:21 Sourcefire3D WFAccessURL: Protocol: TCP, SrcIP: 20.20.20.20, OriginalClientIP: ::, DstIP: 30.30.30.93, SrcPort: 64776, DstPort: 443, TCPFlags: 0x0, IngressInterface: Cisco, EgressInterface: outside, DE: Primary Detection Engine (dc1c2f78-185f-11e6-a6f7-dabf06bba1d5), Policy: SFR-Policy, ConnectType: Start, AccessControlRuleName: Unknown, AccessControlRuleAction: Allow, Prefilter Policy: Unknown, UserName: No Authentication Required, Client: SSL client, ApplicationProtocol: HTTPS, InitiatorPackets: 3, ResponderPackets: 1, InitiatorBytes: 715, ResponderBytes: 66, NAPPolicy: Balanced Security and Connectivity, DNSResponseType: No Error, Sinkhole: Unknown, URLCategory: Uncategorized, URLReputation: Risk unknown, URL: https://www.splunk.com";
0 Karma
1 Solution

woodcock
Esteemed Legend

Why are you complicating it so much? Why not something like this:

(?:https|http|ftp)?:\/\/(?<URL>\S+)

View solution in original post

harrisoncs
Explorer

I wanted to accept all of the answers, I accepted the one I used to accomplish my goal. Appreciate everyone's input. I started with regex101 last week and indent to use it to get me further along.

0 Karma

woodcock
Esteemed Legend

You can upvote any answer or comment (and should, if they helped or educated you at all).

0 Karma

ddrillic
Ultra Champion

You can start easy -

This one matches - (https|http|ftp):\/\/www.splunk.com

and then -

(https|http|ftp):\/\/([a-zA-Z0-9\.]*)

This util is just sensational - regex101
It shows -

alt text

woodcock
Esteemed Legend

Why are you complicating it so much? Why not something like this:

(?:https|http|ftp)?:\/\/(?<URL>\S+)

grimlock
Path Finder

You need to escape special characters like slash and period.

Please reference the following link for special character list.
http://regular-expressions.mobi/characters.html?wlr=1

Hope that helps.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...