Splunk Search

Field Extraction RegEx Fu Help

meatago
Explorer

Spluk is indexing records from DNS Debug logs just fine, but I'd like to extract and transform the domain names in the DNS requests from '(4)news(1)l(6)google(3)com(0)' to 'news.1.google.com'.

Splunk comes up with an auto-generate regex of '(?i) A (?P.*)'. Can I transform the urls into a familiar format with the field extractions?

excerpt from logs:

7/9/2010 2:15:37 PM 0100 PACKET  0000000002BC5D80 UDP Snd 208.67.222.222  1601   Q [1001   D   NOERROR] A      (4)news(1)l(6)google(3)com(0)

7/9/2010 2:15:37 PM 0100 PACKET  0000000003D89FA0 UDP Rcv 208.67.222.222  31c8 R Q [8081   DR  NOERROR] A      (6)images(6)google(3)com(0)
Tags (2)
1 Solution

ftk
Motivator

As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.

You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:

sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"

The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.

If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed

Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:

[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g

This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.

View solution in original post

MillerTime
Splunk Employee
Splunk Employee

The config below worked to clean the raw data at index time. It's not pretty (requires 3 sedcmd passes) but it does replace (#) with periods, and will remove the first and trailing period after the (#) -> . replacement.

'#Props.conf

[dnslog]
SEDCMD-cleandns1 = s/([0-9])/./g
SEDCMD-cleandns2 = s/.$//g
SEDCMD-cleandns3 = s/\s./ /g

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Yeah, you can not do this currently with a regular search time extraction, but you can do this at search time after you have the data:

... | eval newurl=replace(replace(url,"(?<!^)\(\d+\)","."),"^\(\d+\)","")

At index time, you can use SEDCMD similarly as ftk says.

0 Karma

ftk
Motivator

As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.

You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:

sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"

The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.

If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed

Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:

[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g

This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.

meatago
Explorer

rex "((?(\w+((\d))){1,}?)$)"

0 Karma

meatago
Explorer

rex "(?(\w+((\d))){1,}?)$)" has one too many right parentheses. Which one should I remove?

0 Karma
Get Updates on the Splunk Community!

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

At Cisco, purpose isn’t a tagline—it’s a commitment. Cisco’s FY25 Purpose Report outlines how the company is ...

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join us for a live Demo Day at the Cisco Store on January 21st 10:00am - 11:00am PST In the fast-paced world ...