Splunk Search

How to parse proxy information to create a list of rootlevel domains and tld?

packet_hunter
Contributor

Scenario:

I am trying to create a list of all the unique domains (from web requests) from the proxy.

Currently I am using:

index=websurf | stats values(dest)

websurf is the index of the proxy logs
dest is the destination IP or domain the user is going to.

The goal is to extract the rootdomain.tld from the results.
I also need a way to separate the IPs (numeric only) from the domains (alphanumeric), and then further parse the alpha numeric subdomain.rootdomain.tld to rootdomain.tld. I am having no luck writing a REX for this, that starts at the tld and grabs the rootdomain.

some results examples are:

09458kf84ks8.subdom.rootdomain.tld
www.yahoo.com
1.bad.blogspot.ru
54.239.127.240
65.media.tumblr.com 
a.abc.com

Ideally I need two searches: one to extract all the IP(s) from the results and another search to extract all the rootdomain.tld(s).

Thank you

0 Karma
1 Solution

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

View solution in original post

0 Karma

packet_hunter
Contributor

sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /

index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

0 Karma

packet_hunter
Contributor

Any REX KungFu Master... little help

I cannot get this rex to work, it works in the "online regex tester" https://regex101.com/
sample string
http://a57.foxnews.9.com/www.foxnews.com/ucat/images/121/91/302070_ford_121.jpg
I just want
a57.foxnews.9.com and drop all trailing pages following the /
I have tried the following but it keeps erroring out
index=webout | rex field=url "\/\/(?[\w\d.]+)" |stats values(domain)

Any help is much appreciated

0 Karma

packet_hunter
Contributor

never mind it is working now, must have been a glitch

0 Karma
Get Updates on the Splunk Community!

What the End of Support for Splunk Add-on Builder Means for You

Hello Splunk Community! We want to share an important update regarding the future of the Splunk Add-on Builder ...

Solve, Learn, Repeat: New Puzzle Channel Now Live

Welcome to the Splunk Puzzle PlaygroundIf you are anything like me, you love to solve problems, and what ...

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...