Solved: Extracting multiple fields using makemv with a cha...

paulrowen · ‎07-03-2014

Hi. I'm extracting Cisco SNMP traps (yay!) and in particular, the MAC notification MIB. I'm struggling to extract the multiple entries that can appear in the string field.

The string entry is output with no delimeters in the following tuples: Operation (2 chars), VLAN Id(4 chars), MAC(12 chars), D1BasePort Id(4 chars).

Usually we find something like this:

string=0x02003c90b11c5ec073000200

There's a single Operation (02), with one associated VLAN Id (003c), one MAC (90b11c5ec073) and one baseport Id (0002) and then a terminating (00).

However, you can of course have multiple MAC change notifications per VLAN and multiple baseport ids per operation. I've read through this post and this one and unfortunately they haven't quite got it right for MAC notification MIBs because there is a one-to-many relationship between the Operation and all of the subsequent fields!

So the string field can also look like this:

string=0x01004108000f5b547e008601004108000f543a1d000401004108000f560508009301004108000f7d19a000ad00

No delimiters whatsoever between the multiple tuples - nice. I want to loop through this string and extract each of the MAC change notifications. In the above string there are four notifications:

a leading 0x, then:

01004108000f5b547e0086
01004108000f543a1d0004
01004108000f5605080093
01004108000f7d19a000ad

and lastly a terminating 00

I reckon my tokenizer should look like this (switching to bold here because the board mangles the rest of my post if I continue with the code tag!):

makemv tokenizer="([0-9a-f]{2})([0-9a-f]{4})([0-9a-f]{12})([0-9a-f]{4})" string

or actually just:

makemv tokenizer="([0-9a-f]{22})" string

This splits the long concatenated string up. However when I view the output of the following query:

index=main sourcetype="cisco:snmp" string=0x01004108000f5b547e008601004108000f543a1d000401004108000f560508009301004108000f7d19a000ad00 | makemv tokenizer="([0-9a-f]{2})([0-9a-f]{4})([0-9a-f]{12})([0-9a-f]{4})" string | mvexpand string | rex "(?[sa-fA-F0-9]{2})(?[sa-fA-F0-9]{4})(?[sa-fA-F0-9]{12})(?[sa-fA-F0-9]{4})" | table action1, vlan_hex, mac_address, port_hex

(using the above example string) I get the following:

action1 vlan_hex mac_address port_hex
01 0041 08000f5b547e 0086
01 0041 08000f5b547e 0086
01 0041 08000f5b547e 0086
01 0041 08000f5b547e 0086

So only the first entry is output. What I'd like to see is:

action1 vlan_hex mac_address port_hex
01 0041 08000f5b547e 0086
01 0041 08000f543a1d 0086
01 0041 08000f560508 0093
01 0041 08000f7d19a0 00ad

Does anyone have any ideas? Thanks and regards, Paul.

somesoni2 · ‎07-03-2014

This seems to work for me (run anywhere example)

|gentimes start=-1 | eval string="0x01004108000f5b547e008601004108000f543a1d000401004108000f560508009301004108000f7d19a000ad00" | makemv tokenizer="([0-9a-f]{22})" string| mvexpand string | rex field=string "(?<action1>[sa-fA-F0-9]{2})(?<vlan_hex>[sa-fA-F0-9]{4})(?<mac_address>[sa-fA-F0-9]{12})(?<port_hex>[sa-fA-F0-9]{4})" | table action1, vlan_hex, mac_address, port_hex

View solution in original post

somesoni2 · ‎07-03-2014

This seems to work for me (run anywhere example)

|gentimes start=-1 | eval string="0x01004108000f5b547e008601004108000f543a1d000401004108000f560508009301004108000f7d19a000ad00" | makemv tokenizer="([0-9a-f]{22})" string| mvexpand string | rex field=string "(?<action1>[sa-fA-F0-9]{2})(?<vlan_hex>[sa-fA-F0-9]{4})(?<mac_address>[sa-fA-F0-9]{12})(?<port_hex>[sa-fA-F0-9]{4})" | table action1, vlan_hex, mac_address, port_hex

paulrowen · ‎07-03-2014

Was that it? I missing field=string? Great, thanks - that works.

rvany · ‎09-18-2018

For reference: yes, field=string was missing as rex has to be told where to look for matches. Otherwise "_raw" is used per default, which may contain other data. But also the field extraction-terms (like <action1>, <vlan_hex>, ...) were missing in your search as these are necessary to fill your table command with life (and to have correct rex-syntax).

Furthermore only the first capturing group is used as a value of the newly created multivalue field (from the docs). This said the multiple use of capturing groups for filtering out different parts of the input string does not yield the desired result. A nice example for using more than one capturing groups to clearify how to exactly match a given string could be found in https://answers.splunk.com/answers/81799/use-of-tokenizer-option-with-makemv.html

What I am still wondering at is the "s" after each opening "[" in the regex. To my knowledge there's no special use of an "s" in a character set. And there won't be any "s" in a hex string (as indicated by string=0x...")

Extracting multiple fields using makemv with a character class tokenizer to parse a Cisco MAC Notification MIB

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)