Splunk Search

How to parse proxy information to create a list of rootlevel domains and tld?

packet_hunter
Contributor

Scenario:

I am trying to create a list of all the unique domains (from web requests) from the proxy.

Currently I am using:

index=websurf | stats values(dest)

websurf is the index of the proxy logs
dest is the destination IP or domain the user is going to.

The goal is to extract the rootdomain.tld from the results.
I also need a way to separate the IPs (numeric only) from the domains (alphanumeric), and then further parse the alpha numeric subdomain.rootdomain.tld to rootdomain.tld. I am having no luck writing a REX for this, that starts at the tld and grabs the rootdomain.

some results examples are:

09458kf84ks8.subdom.rootdomain.tld
www.yahoo.com
1.bad.blogspot.ru
54.239.127.240
65.media.tumblr.com 
a.abc.com

Ideally I need two searches: one to extract all the IP(s) from the results and another search to extract all the rootdomain.tld(s).

Thank you

0 Karma
1 Solution

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

View solution in original post

0 Karma

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

0 Karma

packet_hunter
Contributor

Any REX KungFu Master... little help

I cannot get this rex to work, it works in the "online regex tester" https://regex101.com/
sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /
I have tried the following but it keeps erroring out
index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

Any help is much appreciated

0 Karma

packet_hunter
Contributor

never mind it is working now, must have been a glitch

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...