Splunk Search

How to handle different field patters in the same sourcetype?

gpullis
Communicator

I'm trying to extract fields for a Barracuda Spam Firewall. For those deeply interested, they've politely documented their syslog output here.

I've gotten as far as the following regex:

(?:[^\s\n]*\s){5}(?<barracuda_process>[\w/]*)\[(?<barracuda_pid>\d*)\]:\s(?<client_ip>.*\]|127.0.0.1)\s(?<message_id>[\w\d-]*)\s(?<start_time>\d*)\s(?<end_time>\d*)\s(?<service>RECV|SCAN|SEND)\s(?<info>.*)

The problem is that my info field should really be multiple fields based on the value of the service field.

For example, if service="SCAN", the subsequent fields should be:

Encrypted Sender Recipient Score Action Reason ReasonExtra "SUBJ:"Subject

While, if service="SEND", the subsequent fields should be:

Encrypted Action QueueID Response

What's the best way to get these fields extracted?

Thanks, all 'yall!

0 Karma

dwaddle
SplunkTrust
SplunkTrust

gpullis, if you are satisfied with one of the answers don't forget to 'accept' it by clicking the outlined check-box to the left of it.

0 Karma

mw
Splunk Employee
Splunk Employee

I believe the more "modern" way to do what dwaddle suggested would be to use the EXTRACT keyword in only the props.conf. Each extract will be run against each event, so you can part it out appropriately. Something like:

[cuda]
EXTRACT-common = (?:[^\s\n]*\s){5}(?<barracuda_process>[\w/]*)\[(?<barracuda_pid>\d*)\]:\s(?<client_ip>.*\]|127.0.0.1)\s(?<message_id>[\w\d-]*)\s(?<start_time>\d*)\s(?<end_time>\d*)
EXTRACT-scan_msg = (?<service>SCAN)\s(?<encrypted>\w+)\s(?<sender>\w+)
EXTRACT-send_msg = (?<service>SEND)\s(?<encrypted>\w+)\s(?<action>\w+)
0 Karma

dwaddle
SplunkTrust
SplunkTrust

In props.conf you can define two different REPORT rules, one for service=SCAN and the other for service=SEND. Something like this:

(props.conf)

[cuda]
REPORT-scan=cudascan
REPORT-send=cudasend

(transforms.conf)

[cudascan]
REGEX=(?:[^\s\n]*\s){5}([\w/]*)\[(\d*)\]:\s(.*\]|127.0.0.1)\s([\w\d-]*)\s(\d*)\s(\d*)\s(SCAN)\s(.*)
FORMAT= barracuda_process::$2  barracuda_pid::$3 client_ip::$4 message_id::$5 start_time::$6 end_time::$7 service::$8 info::$9

[cudasend]
REGEX=(?:[^\s\n]*\s){5}([\w/]*)\[(\d*)\]:\s(.*\]|127.0.0.1)\s([\w\d-]*)\s(\d*)\s(\d*)\s(SEND)\s(.*)
FORMAT= barracuda_process::$2  barracuda_pid::$3 client_ip::$4 message_id::$5 start_time::$6 end_time::$7 service::$8 info::$9

On each event, both of these regexes will be tested, but only one will fire - SCAN versus SEND. My regexes obviously need some work to be entirely correct in the scan versus send situation, but this should let you differentiate between the two and grab fields accordingly. (Notice there should be a third rule for RECV.)

dwaddle
SplunkTrust
SplunkTrust

If you have a couple of (sanitized) samples of each event type, go ahead and edit your question with those - someone can probably help get your regex nailed down for that. Also, have you seen regexr? http://gskinner.com/RegExr/

0 Karma

gpullis
Communicator

Your method makes sense, but my implementation has some suck in it. I've posted my failing as a separate question
here.

I like mw's idea of using the EXTRACT keyword, but I'm failing in the same way when I try to implement it.

0 Karma

cboggs
Explorer

Hey did you ever get this figured out? Don't want to reinvent the wheel and I would like to get field extraction working for these logs as well.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.