Splunk Search

How to optimize a regex function of concurrent consonants?

twisterdavemdCM
New Member

I'm trying to calculate a potential risk score from the number of concurrent consonants in a domain name. (e.g. egorklwqyrjvbsxvhvcws.com is rarely a domain that people intentionally browse... 🙂

So I'm psudo-coding for Splunk in my mind, and I'm envisioning a mess of PCRE regex for assessment criterion that's going to thrash our forwarders and indexers.

Is there a better way to implement the following structure?:

Set (Consonant_Risk_value) = 0%

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{5})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{6})/I
THEN set (Consonant_Risk_value) = 40%

ELSE 

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{7})/i OR Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{8})/I

THEN set (Consonant_Risk_value) = 60%

ELSE

IF Rex(domain_name)/([bcdfghjklmnpqrstvwxyz]{>8})/i

THEN set (Consonant_Risk_value) = 80% 
0 Karma

richgalloway
SplunkTrust
SplunkTrust

For something similar, check out the ut_shannon() function in the URL Toolbox app (https://splunkbase.splunk.com/app/2734/#/details).

---
If this reply helps you, Karma would be appreciated.
0 Karma

woodcock
Esteemed Legend

Like this:

... | eval Consonant_Risk_value=case((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{9,})/i")), "80%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{7})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{8})/I"))), "60%",
                                     ((match(domain_name, "[bcdfghjklmnpqrstvwxyz]{5})/i")) OR
                                      (match(domain_name, "[bcdfghjklmnpqrstvwxyz]{6})/I"))), "40%",
                                     true(), "0%")

P.S. Have you heard about Shannon Entropy?
https://www.splunk.com/blog/2016/04/21/when-entropy-meets-shannon/

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...