Splunk Search

How to use rex to perform field extraction from the end of the field as opposed to the beginning

Explorer

I want to extract the top level domain from the CN field of a certificate in Splunk. The CN field may have multiple levels. I only want the top level domain (e.g.: mysite.com), as opposed to the full CN (e.g.: server.content.mysite.com). The CN can have multiple levels. I only want to extract the last something dot something. Any suggestions?

0 Karma
1 Solution

Path Finder

You can anchor your regex to the end of the line using $. Find a period followed by anything that isn't a period followed by a period followed by anything that isn't a period anchored to the end of the line. See below:

.(?[^.]+.[^.]+)$

While that answers your question, the better solution that I would suggest is leveraging the URLToolbox app (https://splunkbase.splunk.com/app/2734/). Determining TLD and other subcomponents or stats around URLs can be pretty complicated and this app provides a lot of tools to assist with this so you don't have to reinvent the wheel. When you download the app, there's a README that will explain how the app works and what parts would be most relevant to you.

View solution in original post

Explorer

Thank you for the quick response.

0 Karma

Path Finder

You can anchor your regex to the end of the line using $. Find a period followed by anything that isn't a period followed by a period followed by anything that isn't a period anchored to the end of the line. See below:

.(?[^.]+.[^.]+)$

While that answers your question, the better solution that I would suggest is leveraging the URLToolbox app (https://splunkbase.splunk.com/app/2734/). Determining TLD and other subcomponents or stats around URLs can be pretty complicated and this app provides a lot of tools to assist with this so you don't have to reinvent the wheel. When you download the app, there's a README that will explain how the app works and what parts would be most relevant to you.

View solution in original post

Builder

This will match the last 2 elements of a CN:

| rex "^.*?(?<theField>[^.]+\.\w+)$"

See regex101: https://regex101.com/r/ev3p1K/2

0 Karma