Splunk Search

Field extraction problems

Brandon_ganem1
Path Finder

I'm having an issue with extracting a field from proxy log information. I've created a regex that should be extracting the TLD of the domain visited (com net and etc).

I've tried two regex patterns.
REGEX = ^(?i)..(?[a-zA-Z]+)(:\d+)?$ SPLUNK GENERATIED
REGEX = (?i)..
?.(?P[a-z]+)(?=/) My own regex

Sample information (note the field will include only google.com or goole.com:port num):

WORKS ON:
google.com
google.com/something/goes/here.html

DOESN"T WORK:
google.com:443
google.com:443/something/goes/here.html
testsite.com:8080
testsite.com:8080/something/goes/here.html

Transforms entry:
[tld_extract]
SOURCE_KEY = domain0
REGEX = (?i)..*?.(?P[a-z]+)(?=/)

Also, both regexes work if you use them search time regardless of the website.

EXAMPLE:
index=myproxyindex sourcetype="mysourcetype" | rex field=domain "(?i)..*?.(?P[a-z]+)(?=/)"

Works on both traffic with ports in the "domain" field and without.

Tags (1)
0 Karma
1 Solution

hexx
Splunk Employee
Splunk Employee

How about:

REGEX = \.\w+(\.\w+)(?:$|/|:\d+)

View solution in original post

hexx
Splunk Employee
Splunk Employee

How about:

REGEX = \.\w+(\.\w+)(?:$|/|:\d+)

Brandon_ganem1
Path Finder

So I played with your regex and got it to work for what I was looking for. Updated regex: \w+(.(?[a-zA-Z]+))(?:$|\/|:\d+)

Thank you for your help. I'm thinking splunk didn't like the use of ? to make (:\d+) optional in ^(?i).*.(?[a-zA-Z]+)(:\d+)?$

0 Karma

hexx
Splunk Employee
Splunk Employee

So, to be clear, you want the regex to extract "google.com" out of "google.com:443/something/goes/here.html" for example?

0 Karma

Brandon_ganem1
Path Finder

Hexx,
Thanks for your reply. Doesn't look like this is quite what I want for an extraction. Thats my fault, I changed the info in the OP to reflect the values that will be present in the field (sorry!).

What is interesting, everything points to both my regex and the splunk generated regex working correctly. In-fact they do on fields with no port information. When there is port information present, the regex works during search time and during tests (in python and rubular).

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...