Hey everyone!
I'm looking at extracting multi-value fields that contain multiple MAC addresses within a field. I know I can create and manipulated multi-value fields at search time, but I'd like to separate some of this data during ingestion. I've read through the transforms.conf and props.conf manual pages, but the language on transforming data into a multi-value field isn't very clear to me.
For instance, I'm having trouble understanding what the "::$1" characters denote when using the "FORMAT" key in my transforms.conf file. I know it has to do with RegEx capture groups, but I'm just having a hard time relating how data is extracted and stored via a conf file, as I'm more used to using RegEx on the command line with grep.
Would someone kindly help me with an example of how I would extract MAC addresses as a multi-value field? It would really help me bridge the gap of understanding for me, as it's been hard to find concrete examples of how to do this online.
In the CSV file, the "MAC_address" field has the data encapsulated in quotes like this: "ad:00:12:af:21:31, 00:fd:aa:23:d1:a5, {so on}". So I'm thinking my conf files need to look something like this:
props.conf
#sourcetype
[mac_addy]
TRANSFORMS-mv_macaddress = mv_macaddress
transforms.conf
[mv_macaddress]
SOURCE_KEY=MAC_Address
REGEX=(([0-9A-F]{2}[:-]){5}([0-9A-F]{2})[,]+)
FORMAT=mv_macaddress::$1
MV_ADD=true
Would someone please confirm that I'm on the right path or point out any problems with my configuration?
I also have fields in the CSV file that have multiple iterations of the same string (a URL), and I would like to deduplicate them so that random entries don't take up 2 page lengths of a webpage. Almost all of my Google searches for "Splunk deduplicate" point me to results about the search-time command "dedup" or how to use CRCsalt on my index to prevent duplicate whole entries.
Is there any way I can define a string for a field and have Splunk drop or concatenate that field into one line so I don't have dozens (literally dozens of them!) of iterations of "https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/".
I appreciate your help. Thank you!
Missed it by >that< much; just change to this:
REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}))
Missed it by >that< much; just change to this:
REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}))
Did this work, @zanb? Be sure to come back and comment or click Accept
.
Thanks for your help!
I'm guessing I also need to use the DELIMS key in my transforms?
You don't need DELIMS
when you have REGEX
.
I think you need to change the REGEX
line to REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})[,]?)
.