Splunk Search
Highlighted

How to parse proxy information to create a list of rootlevel domains and tld?

Contributor

Scenario:

I am trying to create a list of all the unique domains (from web requests) from the proxy.

Currently I am using:

index=websurf | stats values(dest)

websurf is the index of the proxy logs
dest is the destination IP or domain the user is going to.

The goal is to extract the rootdomain.tld from the results.
I also need a way to separate the IPs (numeric only) from the domains (alphanumeric), and then further parse the alpha numeric subdomain.rootdomain.tld to rootdomain.tld. I am having no luck writing a REX for this, that starts at the tld and grabs the rootdomain.

some results examples are:

09458kf84ks8.subdom.rootdomain.tld
www.yahoo.com
1.bad.blogspot.ru
54.239.127.240
65.media.tumblr.com 
a.abc.com

Ideally I need two searches: one to extract all the IP(s) from the results and another search to extract all the rootdomain.tld(s).

Thank you

0 Karma
Highlighted

Re: How to parse proxy information to create a list of rootlevel domains and tld?

Contributor

Any REX KungFu Master... little help

I cannot get this rex to work, it works in the "online regex tester" https://regex101.com/
sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /
I have tried the following but it keeps erroring out
index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

Any help is much appreciated

0 Karma
Highlighted

Re: How to parse proxy information to create a list of rootlevel domains and tld?

Contributor

never mind it is working now, must have been a glitch

0 Karma
Highlighted

Re: How to parse proxy information to create a list of rootlevel domains and tld?

Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

View solution in original post

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.