Splunk Search

How to write the regex to extract URLs 32 to 48 characters in length and ending with .ru or .org?

avis1119
New Member

Hi Everyone,

I would like to write a regex for extracting URL's with 32 to 48 characters long and ending with .ru or .org..... there should not be any special characters involved before .org or .ru. Please help me in writing the regex

Thank you in advance.

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Hey @avis1119,

Whats good?

^(?<long_url>[\w\.-]{{32,48}(?=(?:\.org|\.ru)))

So this regular expression just gets a 32-48 character long string BEFORE a .org or a .ru. Its not very robust per se, it captures word characters (a-z, A-Z, 0-9, as well as a literal . and a dash and underscore). But if you were using it with rex in Splunk and had already defined a URL field, it should be fine, e.g.

| rex field=URL "^(?<long_url>[\w\.-]{32,48}(?=(?:\.org|\.ru)))"

https://regex101.com/r/vI6bY5/1

0 Karma

avis1119
New Member

I have the field URL defined already.
it is not giving the exact output as i require... it should not include any spl characters even "." and "-" before TLD's. for example: hgwoui87864vhvbviobigb23Ajkbbjsgivu.org
eufvuUHOUVuw8y9814hviyiwh9283bhvcsdvg2tnbgbv.net

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

just replace [\w\.-] with whatever you want like [a-z] for all lower case, [a-zA-Z] for lowercase and uppercase, [a-zA-Z0-9] for upper/lower/numbers.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...