Spluk is indexing records from DNS Debug logs just fine, but I'd like to extract and transform the domain names in the DNS requests from '(4)news(1)l(6)google(3)com(0)' to 'news.1.google.com'.
Splunk comes up with an auto-generate regex of '(?i) A (?P.*)'. Can I transform the urls into a familiar format with the field extractions?
excerpt from logs:
7/9/2010 2:15:37 PM 0100 PACKET 0000000002BC5D80 UDP Snd 208.67.222.222 1601 Q [1001 D NOERROR] A (4)news(1)l(6)google(3)com(0)
7/9/2010 2:15:37 PM 0100 PACKET 0000000003D89FA0 UDP Rcv 208.67.222.222 31c8 R Q [8081 DR NOERROR] A (6)images(6)google(3)com(0)
As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.
You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:
sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"
The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.
If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed
Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:
[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g
This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.
The config below worked to clean the raw data at index time. It's not pretty (requires 3 sedcmd passes) but it does replace (#) with periods, and will remove the first and trailing period after the (#) -> . replacement.
'#Props.conf
[dnslog]
SEDCMD-cleandns1 = s/([0-9])/./g
SEDCMD-cleandns2 = s/.$//g
SEDCMD-cleandns3 = s/\s./ /g
Yeah, you can not do this currently with a regular search time extraction, but you can do this at search time after you have the data:
... | eval newurl=replace(replace(url,"(?<!^)\(\d+\)","."),"^\(\d+\)","")
At index time, you can use SEDCMD similarly as ftk says.
As far as I know you cannot do replacements in the field extraction to get the periods inserted, but you should be able to transform the domain name with SEDCMD and then extract the field.
You can test your extractions and SEDCMD from the gui first and then put them in your config files later. The following search should work:
sourcetype="my_dns_debug_logs" | rex "(?<extract_domain>(\w+(\(\d\))){1,}?)$)" | rex mode=sed field=extract_domain "s/(\\(\\d\\))/./g"
The first rex command extracts the domain name, and the second one replaces the (n) instances with a period.
If this works for you in the GUI, then you can get started on using SEDCMD: http://www.splunk.com/base/Documentation/4.1.3/Admin/Anonymizedatawithsed
Since SEDCMD only applies to the _raw field at index time, using a stanza such as the following on in props.conf:
[sourcetype::my_dns_debug_logs]
SEDCMD-domainname = s/(\\(\\d\\))/./g
This will result in your events in your index actually being indexed with the correct domain name -- then you should be to use a simple field extraction at that point. Don't forget to restart splunk after the props.conf change.
rex "((?
rex "(?