Splunk Search

What is wrong with this regular expression to extract the URL from our logs?

harrisoncs
Explorer

I am attempting to extract the URL from our webfilter logs. The automatic field extraction process did not work. I now have a partially working expression and can't seem to find the reason it's not working. See below:

(?(https|http|ftp)://[a-zA-Z0-9.\-_]+/[a-zA-Z0-9+&@#/%=~_\-|!:,.;]*)

This command is only returning a couple of http URLs. It is not getting any https even though preview shows plenty of possibilities. Is there something simple I'm missing? One iteration only had https in the expression, however, it returned no results. The sample data below as it stands now, would not return results, as it is https.

Sample data (IPs have been changed)

"May 12 15:30:26 10.10.10.10 May 12 19:30:21 Sourcefire3D WFAccessURL: Protocol: TCP, SrcIP: 20.20.20.20, OriginalClientIP: ::, DstIP: 30.30.30.93, SrcPort: 64776, DstPort: 443, TCPFlags: 0x0, IngressInterface: Cisco, EgressInterface: outside, DE: Primary Detection Engine (dc1c2f78-185f-11e6-a6f7-dabf06bba1d5), Policy: SFR-Policy, ConnectType: Start, AccessControlRuleName: Unknown, AccessControlRuleAction: Allow, Prefilter Policy: Unknown, UserName: No Authentication Required, Client: SSL client, ApplicationProtocol: HTTPS, InitiatorPackets: 3, ResponderPackets: 1, InitiatorBytes: 715, ResponderBytes: 66, NAPPolicy: Balanced Security and Connectivity, DNSResponseType: No Error, Sinkhole: Unknown, URLCategory: Uncategorized, URLReputation: Risk unknown, URL: https://www.splunk.com";
0 Karma
1 Solution

woodcock
Esteemed Legend

Why are you complicating it so much? Why not something like this:

(?:https|http|ftp)?:\/\/(?<URL>\S+)

View solution in original post

harrisoncs
Explorer

I wanted to accept all of the answers, I accepted the one I used to accomplish my goal. Appreciate everyone's input. I started with regex101 last week and indent to use it to get me further along.

0 Karma

woodcock
Esteemed Legend

You can upvote any answer or comment (and should, if they helped or educated you at all).

0 Karma

ddrillic
Ultra Champion

You can start easy -

This one matches - (https|http|ftp):\/\/www.splunk.com

and then -

(https|http|ftp):\/\/([a-zA-Z0-9\.]*)

This util is just sensational - regex101
It shows -

alt text

woodcock
Esteemed Legend

Why are you complicating it so much? Why not something like this:

(?:https|http|ftp)?:\/\/(?<URL>\S+)

grimlock
Path Finder

You need to escape special characters like slash and period.

Please reference the following link for special character list.
http://regular-expressions.mobi/characters.html?wlr=1

Hope that helps.

Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...