Solved: How do you extract multi-value fields at ingestion...

zanb · ‎11-13-2018

Hey everyone!

I'm looking at extracting multi-value fields that contain multiple MAC addresses within a field. I know I can create and manipulated multi-value fields at search time, but I'd like to separate some of this data during ingestion. I've read through the transforms.conf and props.conf manual pages, but the language on transforming data into a multi-value field isn't very clear to me.

For instance, I'm having trouble understanding what the "::$1" characters denote when using the "FORMAT" key in my transforms.conf file. I know it has to do with RegEx capture groups, but I'm just having a hard time relating how data is extracted and stored via a conf file, as I'm more used to using RegEx on the command line with grep.

Would someone kindly help me with an example of how I would extract MAC addresses as a multi-value field? It would really help me bridge the gap of understanding for me, as it's been hard to find concrete examples of how to do this online.

In the CSV file, the "MAC_address" field has the data encapsulated in quotes like this: "ad:00:12:af:21:31, 00:fd:aa:23:d1:a5, {so on}". So I'm thinking my conf files need to look something like this:

props.conf

#sourcetype
[mac_addy]
TRANSFORMS-mv_macaddress = mv_macaddress

transforms.conf

[mv_macaddress]
SOURCE_KEY=MAC_Address
REGEX=(([0-9A-F]{2}[:-]){5}([0-9A-F]{2})[,]+)
FORMAT=mv_macaddress::$1
MV_ADD=true

Would someone please confirm that I'm on the right path or point out any problems with my configuration?

I also have fields in the CSV file that have multiple iterations of the same string (a URL), and I would like to deduplicate them so that random entries don't take up 2 page lengths of a webpage. Almost all of my Google searches for "Splunk deduplicate" point me to results about the search-time command "dedup" or how to use CRCsalt on my index to prevent duplicate whole entries.

Is there any way I can define a string for a field and have Splunk drop or concatenate that field into one line so I don't have dozens (literally dozens of them!) of iterations of "https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/".

I appreciate your help. Thank you!

woodcock · ‎11-22-2018

Missed it by >that< much; just change to this:

REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}))

View solution in original post

woodcock · ‎11-22-2018

Missed it by >that< much; just change to this:

REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}))

woodcock · ‎11-26-2018

Did this work, @zanb? Be sure to come back and comment or click Accept.

zanb · ‎11-26-2018

Thanks for your help!

zanb · ‎11-13-2018

I'm guessing I also need to use the DELIMS key in my transforms?

richgalloway · ‎11-22-2018

You don't need DELIMS when you have REGEX.

I think you need to change the REGEX line to REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})[,]?).

---
If this reply helps you, Karma would be appreciated.

How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024