I have been through the field extractor, answers.splunk.com, and the interwebs looking for help on this one. So our Palo Alto will give us the URLs of sites visited - here is a sample:
crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl
safebrowsing-cache.google.com/
p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/
de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html
a248.e.akamai.net/
I would like to be able to extract the domains e.g.
microsoft or microsoft.com
google or google.com
gstatic or gstatic.com
tynt or tynt.com
akamai or akamai.net
I would think that the way to go about it is to look for the FIRST .com, .net, .org etc and then work back to the previous .
to grab the domain but that is beyond me.
Can anyone help?
Try this run anywhere sample
| gentimes start=-1 | eval URL="crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl safebrowsing-cache.google.com/ p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/ de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html a248.e.akamai.net/" | table _raw | makemv URL| mvexpand URL| rex field=URL "(?<domain>\w+\.\w+)\/"
Try this run anywhere sample
| gentimes start=-1 | eval URL="crl.microsoft.com/pki/crl/products/MicRooCerAut2011_2011_03_22.crl safebrowsing-cache.google.com/ p4-a2lp5grl52xoy-qpo2s4ky6vs36rpb-794312-s1-v6exp3-v4.metric.gstatic.com/ de.tynt.com/deb/v2?id=dZxfWCGner46jsacwqm_6l&r=lyricstranslate.com/en/l039amour-c039est-pour-rien-love-nothing.html a248.e.akamai.net/" | table _raw | makemv URL| mvexpand URL| rex field=URL "(?<domain>\w+\.\w+)\/"
You have it, but help me understand it so that I may apply it to my search. As @Rhin0Crash stated the Palo Altos see the field as "url" so my base search is: index=pan_logs sourcetype=pan* src_ip=x.x.x.x url=*
index=pan_logs sourcetype=pan* src_ip=x.x.x.x url=* | rex field=URL "(?\w+.\w+)\/" | table domain _raw
Yup you got it!
| rex field=url "(?<domain>\w+\.\w+)\/"
search | rex field=_raw "(?<domain>\w+)\.(com|net|gov|edu|co)"
I think
You can replace the field with what field the PA gives you for URL. That might be URL, or misc, or uri.