How to optimize a regex function of concurrent con...

twisterdavemdCM · ‎04-11-2017

I'm trying to calculate a potential risk score from the number of concurrent consonants in a domain name. (e.g. egorklwqyrjvbsxvhvcws.com is rarely a domain that people intentionally browse... 🙂

So I'm psudo-coding for Splunk in my mind, and I'm envisioning a mess of PCRE regex for assessment criterion that's going to thrash our forwarders and indexers.

Is there a better way to implement the following structure?:

Set (Consonant_Risk_value) = 0%

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{5})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{6})/I
THEN set (Consonant_Risk_value) = 40%

ELSE 

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{7})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{8})/I

THEN set (Consonant_Risk_value) = 60%

ELSE

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{>8})/i

THEN set (Consonant_Risk_value) = 80%

richgalloway · ‎04-11-2017

For something similar, check out the ut_shannon() function in the URL Toolbox app (https://splunkbase.splunk.com/app/2734/#/details).

---
If this reply helps you, Karma would be appreciated.

woodcock · ‎04-11-2017

Like this:

... | eval Consonant_Risk_value=case((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{9,})/i")), "80%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{7})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{8})/I"))), "60%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{5})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{6})/I"))), "40%",
                                     true(), "0%")

P.S. Have you heard about Shannon Entropy?
https://www.splunk.com/blog/2016/04/21/when-entropy-meets-shannon/

How to optimize a regex function of concurrent consonants?

Get Operational Insights Quickly with Natural Language on the Splunk Platform

Stay Connected: Your Guide to August Tech Talks, Office Hours, and Webinars!

Unleash the Power of Splunk MCP and AI, Meet Us at .Conf 2025, and Find Even More New ...

Are you a member of the Splunk Community?

How to optimize a regex function of concurrent consonants?

Get Operational Insights Quickly with Natural Language on the Splunk Platform

Stay Connected: Your Guide to August Tech Talks, Office Hours, and Webinars!

Unleash the Power of Splunk MCP and AI, Meet Us at .Conf 2025, and Find Even More New ...