Splunk Search

How do you extract multi-value fields at ingestion & deduplicate other multi-value fields?

zanb
Path Finder

Hey everyone!

I'm looking at extracting multi-value fields that contain multiple MAC addresses within a field. I know I can create and manipulated multi-value fields at search time, but I'd like to separate some of this data during ingestion. I've read through the transforms.conf and props.conf manual pages, but the language on transforming data into a multi-value field isn't very clear to me.

For instance, I'm having trouble understanding what the "::$1" characters denote when using the "FORMAT" key in my transforms.conf file. I know it has to do with RegEx capture groups, but I'm just having a hard time relating how data is extracted and stored via a conf file, as I'm more used to using RegEx on the command line with grep.

Would someone kindly help me with an example of how I would extract MAC addresses as a multi-value field? It would really help me bridge the gap of understanding for me, as it's been hard to find concrete examples of how to do this online.

In the CSV file, the "MAC_address" field has the data encapsulated in quotes like this: "ad:00:12:af:21:31, 00:fd:aa:23:d1:a5, {so on}". So I'm thinking my conf files need to look something like this:

props.conf

#sourcetype
[mac_addy]
TRANSFORMS-mv_macaddress = mv_macaddress

transforms.conf

[mv_macaddress]
SOURCE_KEY=MAC_Address
REGEX=(([0-9A-F]{2}[:-]){5}([0-9A-F]{2})[,]+)
FORMAT=mv_macaddress::$1
MV_ADD=true

Would someone please confirm that I'm on the right path or point out any problems with my configuration?

I also have fields in the CSV file that have multiple iterations of the same string (a URL), and I would like to deduplicate them so that random entries don't take up 2 page lengths of a webpage. Almost all of my Google searches for "Splunk deduplicate" point me to results about the search-time command "dedup" or how to use CRCsalt on my index to prevent duplicate whole entries.

Is there any way I can define a string for a field and have Splunk drop or concatenate that field into one line so I don't have dozens (literally dozens of them!) of iterations of "https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/ https://icanhas.cheezburger.com/".

I appreciate your help. Thank you!

0 Karma
1 Solution

woodcock
Esteemed Legend

Missed it by >that< much; just change to this:

REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}))

View solution in original post

woodcock
Esteemed Legend

Missed it by >that< much; just change to this:

REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2}))

woodcock
Esteemed Legend

Did this work, @zanb? Be sure to come back and comment or click Accept.

0 Karma

zanb
Path Finder

Thanks for your help!

0 Karma

zanb
Path Finder

I'm guessing I also need to use the DELIMS key in my transforms?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You don't need DELIMS when you have REGEX.

I think you need to change the REGEX line to REGEX=(([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})[,]?).

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...