Splunk Search

Field Extraction RegEx Fu Help

meatago
Explorer

Spluk is indexing records from DNS Debug logs just fine, but I'd like to extract and transform the domain names in the DNS requests from '(4)news(1)l(6)google(3)com(0)' to 'news.1.google.com'.

Splunk comes up with an auto-generate regex of '(?i) A (?P.*)'. Can I transform the urls into a familiar format with the field extractions?

excerpt from logs:

7/9/2010 2:15:37 PM 0100 PACKET  0000000002BC5D80 UDP Snd 208.67.222.222  1601   Q [1001   D   NOERROR] A      (4)news(1)l(6)google(3)com(0)

7/9/2010 2:15:37 PM 0100 PACKET  0000000003D89FA0 UDP Rcv 208.67.222.222  31c8 R Q [8081   DR  NOERROR] A      (6)images(6)google(3)com(0)
Tags (2)
1 Solution

ftk
Motivator

As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.

You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:

sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"

The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.

If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed

Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:

[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g

This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.

View solution in original post

MillerTime
Splunk Employee
Splunk Employee

The config below worked to clean the raw data at index time. It's not pretty (requires 3 sedcmd passes) but it does replace (#) with periods, and will remove the first and trailing period after the (#) -> . replacement.

'#Props.conf

[dnslog]
SEDCMD-cleandns1 = s/([0-9])/./g
SEDCMD-cleandns2 = s/.$//g
SEDCMD-cleandns3 = s/\s./ /g

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Yeah, you can not do this currently with a regular search time extraction, but you can do this at search time after you have the data:

... | eval newurl=replace(replace(url,"(?<!^)\(\d+\)","."),"^\(\d+\)","")

At index time, you can use SEDCMD similarly as ftk says.

0 Karma

ftk
Motivator

As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.

You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:

sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"

The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.

If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed

Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:

[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g

This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.

meatago
Explorer

rex "((?(\w+((\d))){1,}?)$)"

0 Karma

meatago
Explorer

rex "(?(\w+((\d))){1,}?)$)" has one too many right parentheses. Which one should I remove?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...