Splunk Search

Field Extraction RegEx Fu Help

meatago
Explorer

Spluk is indexing records from DNS Debug logs just fine, but I'd like to extract and transform the domain names in the DNS requests from '(4)news(1)l(6)google(3)com(0)' to 'news.1.google.com'.

Splunk comes up with an auto-generate regex of '(?i) A (?P.*)'. Can I transform the urls into a familiar format with the field extractions?

excerpt from logs:

7/9/2010 2:15:37 PM 0100 PACKET  0000000002BC5D80 UDP Snd 208.67.222.222  1601   Q [1001   D   NOERROR] A      (4)news(1)l(6)google(3)com(0)

7/9/2010 2:15:37 PM 0100 PACKET  0000000003D89FA0 UDP Rcv 208.67.222.222  31c8 R Q [8081   DR  NOERROR] A      (6)images(6)google(3)com(0)
Tags (2)
1 Solution

ftk
Motivator

As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.

You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:

sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"

The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.

If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed

Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:

[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g

This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.

View solution in original post

MillerTime
Splunk Employee
Splunk Employee

The config below worked to clean the raw data at index time. It's not pretty (requires 3 sedcmd passes) but it does replace (#) with periods, and will remove the first and trailing period after the (#) -> . replacement.

'#Props.conf

[dnslog]
SEDCMD-cleandns1 = s/([0-9])/./g
SEDCMD-cleandns2 = s/.$//g
SEDCMD-cleandns3 = s/\s./ /g

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Yeah, you can not do this currently with a regular search time extraction, but you can do this at search time after you have the data:

... | eval newurl=replace(replace(url,"(?<!^)\(\d+\)","."),"^\(\d+\)","")

At index time, you can use SEDCMD similarly as ftk says.

0 Karma

ftk
Motivator

As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.

You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:

sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"

The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.

If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed

Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:

[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g

This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.

View solution in original post

meatago
Explorer

rex "((?(\w+((\d))){1,}?)$)"

0 Karma

meatago
Explorer

rex "(?(\w+((\d))){1,}?)$)" has one too many right parentheses. Which one should I remove?

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.