I am attempting to extract the URL from our webfilter logs. The automatic field extraction process did not work. I now have a partially working expression and can't seem to find the reason it's not working. See below:
(?(https|http|ftp)://[a-zA-Z0-9.\-_]+/[a-zA-Z0-9+&@#/%=~_\-|!:,.;]*)
This command is only returning a couple of http URLs. It is not getting any https even though preview shows plenty of possibilities. Is there something simple I'm missing? One iteration only had https in the expression, however, it returned no results. The sample data below as it stands now, would not return results, as it is https.
Sample data (IPs have been changed)
"May 12 15:30:26 10.10.10.10 May 12 19:30:21 Sourcefire3D WFAccessURL: Protocol: TCP, SrcIP: 20.20.20.20, OriginalClientIP: ::, DstIP: 30.30.30.93, SrcPort: 64776, DstPort: 443, TCPFlags: 0x0, IngressInterface: Cisco, EgressInterface: outside, DE: Primary Detection Engine (dc1c2f78-185f-11e6-a6f7-dabf06bba1d5), Policy: SFR-Policy, ConnectType: Start, AccessControlRuleName: Unknown, AccessControlRuleAction: Allow, Prefilter Policy: Unknown, UserName: No Authentication Required, Client: SSL client, ApplicationProtocol: HTTPS, InitiatorPackets: 3, ResponderPackets: 1, InitiatorBytes: 715, ResponderBytes: 66, NAPPolicy: Balanced Security and Connectivity, DNSResponseType: No Error, Sinkhole: Unknown, URLCategory: Uncategorized, URLReputation: Risk unknown, URL: https://www.splunk.com";
Why are you complicating it so much? Why not something like this:
(?:https|http|ftp)?:\/\/(?<URL>\S+)
I wanted to accept all of the answers, I accepted the one I used to accomplish my goal. Appreciate everyone's input. I started with regex101 last week and indent to use it to get me further along.
You can upvote any answer or comment (and should, if they helped or educated you at all).
Why are you complicating it so much? Why not something like this:
(?:https|http|ftp)?:\/\/(?<URL>\S+)
You need to escape special characters like slash and period.
Please reference the following link for special character list.
http://regular-expressions.mobi/characters.html?wlr=1
Hope that helps.