Hi buddy,
I followed your answer and find it does return a result list with dga tag. But it seems too many false positive dga domains...My SPL is listed below:
index=nids event_type=dns earliest=-5min
|stats count by query|rename query as domain|fields domain
| eval _time=now()-(60.000*random()/2147483647)
| table _time domain
| `ut_shannon(domain)`
| `ut_meaning(domain)`
| eval ut_digit_ratio = 0.0
| eval ut_vowel_ratio = 0.0
| eval ut_domain_length = max(1,len(domain))
| rex field=domain max_match=0 "(?<digits>\d)"
| rex field=domain max_match=0 "(?<vowels>[aeiou])"
| eval ut_digit_ratio=if(isnull(digits),0.0,mvcount(digits) / ut_domain_length)
| eval ut_vowel_ratio=if(isnull(vowels),0.0,mvcount(vowels) / ut_domain_length)
| eval ut_consonant_ratio = max(0.0, 1.000000 - ut_digit_ratio - ut_vowel_ratio)
| eval ut_vc_ratio = ut_vowel_ratio / ut_consonant_ratio
| apply "dga_ngram"
| apply "dga_pca"
| apply "dga_randomforest" as class
| fields - digits - vowels - domain_tfidf*
| where class="dga"
I also tried to improve my models with retraining on dga_test dataset. But the final results is not so satisfied.
How can I put these real dns domains to DGA app and then use step 4 to check and adjust my detected DGA classified domain names for further black/white listing and future learning ?
... View more