Splunk Search
Highlighted

How to optimize a regex function of concurrent consonants?

New Member

I'm trying to calculate a potential risk score from the number of concurrent consonants in a domain name. (e.g. egorklwqyrjvbsxvhvcws.com is rarely a domain that people intentionally browse... 🙂

So I'm psudo-coding for Splunk in my mind, and I'm envisioning a mess of PCRE regex for assessment criterion that's going to thrash our forwarders and indexers.

Is there a better way to implement the following structure?:

Set (Consonant_Risk_value) = 0%

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{5})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{6})/I
THEN set (Consonant_Risk_value) = 40%

ELSE 

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{7})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{8})/I

THEN set (Consonant_Risk_value) = 60%

ELSE

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{>8})/i

THEN set (Consonant_Risk_value) = 80% 
0 Karma
Highlighted

Re: How to optimize a regex function of concurrent consonants?

Esteemed Legend

Like this:

... | eval Consonant_Risk_value=case((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{9,})/i")), "80%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{7})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{8})/I"))), "60%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{5})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{6})/I"))), "40%",
                                     true(), "0%")

P.S. Have you heard about Shannon Entropy?
https://www.splunk.com/blog/2016/04/21/when-entropy-meets-shannon/

0 Karma
Highlighted

Re: How to optimize a regex function of concurrent consonants?

SplunkTrust
SplunkTrust

For something similar, check out the ut_shannon() function in the URL Toolbox app (https://splunkbase.splunk.com/app/2734/#/details).

---
If this reply helps you, an upvote would be appreciated.
0 Karma