topic How to optimize a regex function of concurrent consonants? in Splunk Search

How to optimize a regex function of concurrent consonants?

twisterdavemdCM — Tue, 11 Apr 2017 18:42:25 GMT

I'm trying to calculate a potential risk score from the number of concurrent consonants in a domain name. (e.g. egorklwqyrjvbsxvhvcws.com is rarely a domain that people intentionally browse... 🙂

So I'm psudo-coding for Splunk in my mind, and I'm envisioning a mess of PCRE regex for assessment criterion that's going to thrash our forwarders and indexers.

Is there a better way to implement the following structure?:

Set (Consonant_Risk_value) = 0%

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{5})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{6})/I
THEN set (Consonant_Risk_value) = 40%

ELSE 

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{7})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{8})/I

THEN set (Consonant_Risk_value) = 60%

ELSE

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{>8})/i

THEN set (Consonant_Risk_value) = 80%

Re: How to optimize a regex function of concurrent consonants?

woodcock — Tue, 11 Apr 2017 19:07:47 GMT

Like this:

... | eval Consonant_Risk_value=case((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{9,})/i")), "80%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{7})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{8})/I"))), "60%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{5})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{6})/I"))), "40%",
                                     true(), "0%")

P.S. Have you heard about Shannon Entropy?
https://www.splunk.com/blog/2016/04/21/when-entropy-meets-shannon/

Re: How to optimize a regex function of concurrent consonants?

richgalloway — Tue, 11 Apr 2017 19:08:22 GMT

For something similar, check out the ut_shannon() function in the URL Toolbox app (https://splunkbase.splunk.com/app/2734/#/details).