How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

zanb — Tue, 13 Nov 2018 15:16:44 GMT

Hey everyone!

I'm looking at extracting multi-value fields that contain multiple MAC addresses within a field. I know I can create and manipulated multi-value fields at search time, but I'd like to separate some of this data during ingestion. I've read through the transforms.conf and props.conf manual pages, but the language on transforming data into a multi-value field isn't very clear to me.

For instance, I'm having trouble understanding what the "::$1" characters denote when using the "FORMAT" key in my transforms.conf file. I know it has to do with RegEx capture groups, but I'm just having a hard time relating how data is extracted and stored via a conf file, as I'm more used to using RegEx on the command line with grep.

Would someone kindly help me with an example of how I would extract MAC addresses as a multi-value field? It would really help me bridge the gap of understanding for me, as it's been hard to find concrete examples of how to do this online.

In the CSV file, the "MAC_address" field has the data encapsulated in quotes like this: "ad:00:12:af:21:31, 00:fd:aa:23:d1:a5, {so on}". So I'm thinking my conf files need to look something like this:

props.conf

#sourcetype
[mac_addy]
TRANSFORMS-mv_macaddress = mv_macaddress

transforms.conf

[mv_macaddress]
SOURCE_KEY=MAC_Address
REGEX=(([0-9A-F]{2}[:-]){5}([0-9A-F]{2})[,]+)
FORMAT=mv_macaddress::$1
MV_ADD=true

Would someone please confirm that I'm on the right path or point out any problems with my configuration?

I also have fields in the CSV file that have multiple iterations of the same string (a URL), and I would like to deduplicate them so that random entries don't take up 2 page lengths of a webpage. Almost all of my Google searches for "Splunk deduplicate" point me to results about the search-time command "dedup" or how to use CRCsalt on my index to prevent duplicate whole entries.

Is there any way I can define a string for a field and have Splunk drop or concatenate that field into one line so I don't have dozens (literally dozens of them!) of iterations of "https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/".

I appreciate your help. Thank you!

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

zanb — Tue, 13 Nov 2018 15:45:18 GMT

I'm guessing I also need to use the DELIMS key in my transforms?

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

richgalloway — Thu, 22 Nov 2018 14:15:02 GMT

You don't need DELIMS when you have REGEX.

I think you need to change the REGEX line to REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})[,]?).

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

woodcock — Thu, 22 Nov 2018 17:31:24 GMT

Missed it by >that< much; just change to this:

REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}))

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

woodcock — Mon, 26 Nov 2018 21:25:57 GMT

Did this work, @zanb? Be sure to come back and comment or click Accept.

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

zanb — Mon, 26 Nov 2018 21:31:05 GMT

Thanks for your help!

topic Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields? in Splunk Search

How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

Re: How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?