Splunk Search

How to parse proxy information to create a list of rootlevel domains and tld?

packet_hunter
Contributor

Scenario:

I am trying to create a list of all the unique domains (from web requests) from the proxy.

Currently I am using:

index=websurf | stats values(dest)

websurf is the index of the proxy logs
dest is the destination IP or domain the user is going to.

The goal is to extract the rootdomain.tld from the results.
I also need a way to separate the IPs (numeric only) from the domains (alphanumeric), and then further parse the alpha numeric subdomain.rootdomain.tld to rootdomain.tld. I am having no luck writing a REX for this, that starts at the tld and grabs the rootdomain.

some results examples are:

09458kf84ks8.subdom.rootdomain.tld
www.yahoo.com
1.bad.blogspot.ru
54.239.127.240
65.media.tumblr.com 
a.abc.com

Ideally I need two searches: one to extract all the IP(s) from the results and another search to extract all the rootdomain.tld(s).

Thank you

0 Karma
1 Solution

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

View solution in original post

0 Karma

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

0 Karma

packet_hunter
Contributor

Any REX KungFu Master... little help

I cannot get this rex to work, it works in the "online regex tester" https://regex101.com/
sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /
I have tried the following but it keeps erroring out
index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

Any help is much appreciated

0 Karma

packet_hunter
Contributor

never mind it is working now, must have been a glitch

0 Karma
Get Updates on the Splunk Community!

Splunk Developers: Go Beyond the Dashboard with These .Conf25 Sessions

  Whether you’re building custom apps, diving into SPL2, or integrating AI and machine learning into your ...

Index This | How do you write 23 only using the number 2?

July 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...

Splunk ITSI & Correlated Network Visibility

  Now On Demand   Take Your Network Visibility to the Next Level In today’s complex IT environments, ...