Splunk Enterprise

How to extract multi-value fields using Field Extractor

dtaylor
Path Finder

As per the subject, I'm attempting to convert a rex expression in my search into a proper field extraction using the Field Extractor so I can drop the rex and use the field in my base search directly. The rex expression works perfectly but requires the use of max_match=0 in order to get all the results. Unless I'm mistaken(which is very posible), I don't have that option available in the Field Extractor, and because of that, the regex only picks up one value instead of multiple. I've tested the regex on regex101, and it works fine there, grabbing all the values properly. It's just in the Field Extractor that it isn't grabbing stuff. The rex expression itself does use a specific field rather than just running on _raw, but the search does work when running on _raw(I verified)

The rex expression is placed below followed by the regex itself.

 

 

rex field=AttachmentDetails max_match=0 "(?:'(?<attachments>.*?)'.*?'fileHash': '(?<sha256>\w+)'}.*?\{.*?\}\}[,\}]\s?)"

 

 



 

 

(?:'(?<attachments>.*?)'.*?'fileHash': '(?<sha256>\w+)'}.*?\{.*?\}\}[,\}]\s?)

 

 

 

 

Below, I've placed some test data you can use on regex101 to verify the expression above. It captures both fields on the site, but just not in Splunk, and I can't tell why. Perhaps I've misunderstood how grouping works in regex.



orci eget eros faucibus tincidunt. Duis leo. Sed fringilla mauris sit amet nibh. Donec sodales sagittis magna. Sed consequat, leo eget bibendum sodales, augue velit cursus nunc, {'NotSecrets!!.txt': 'fileHash': 'a3b9adaee5b83973e8789edd7b04b95f25412c764c8ff29d0c63abf25b772646'}, {}}, 'Secrets!!.txt': 'fileHash': 'c092a4db704b9c6f61d6a221b8f0ea5f719e7f674f66fede01a522563687d24b'}, {}}} orci eget eros faucibus tincidunt. Duis leo. Sed fringilla mauris sit amet nibh. Donec sodales sagittis magna. Sed consequat, leo eget bibendum sodales, augue velit cursus nunc,

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

OK. It seems that Field Extractor only creates inline extractions. If you want to create transform-based extractions, you need to do them from the Settings menu

Settings -> Fields -> Field transformations - there you can create a new transform with a possibility to check a "create multivalued fields" option

Then you can use the transform created here to create extraction in Settings -> Field -> Field Extractions

View solution in original post

isoutamo
SplunkTrust
SplunkTrust

Hi

I did't get why you cannot use that rex which is working? In personally I always prefer to use my own rex than those which are created by field extractor.

It's splunk's design decision that if there are multiple matches then those are put in mv fields.

You can always expand those into individual events if mv fields are not suitable for your use case.

| makeresults
| eval _raw = "orci eget eros faucibus tincidunt. Duis leo. Sed fringilla mauris sit amet nibh. Donec sodales sagittis magna. Sed consequat, leo eget bibendum sodales, augue velit cursus nunc, {'NotSecrets!!.txt': 'fileHash': 'a3b9adaee5b83973e8789edd7b04b95f25412c764c8ff29d0c63abf25b772646'}, {}}, 'Secrets!!.txt': 'fileHash': 'c092a4db704b9c6f61d6a221b8f0ea5f719e7f674f66fede01a522563687d24b'}, {}}} orci eget eros faucibus tincidunt. Duis leo. Sed fringilla mauris sit amet nibh. Donec sodales sagittis magna. Sed consequat, leo eget bibendum sodales, augue velit cursus nunc,"
| rex max_match=0 "(?:'(?<attachments>.*?)'.*?'fileHash': '(?<sha256>\w+)'}.*?\{.*?\}\}[,\}]\s?)"
| eval foo = mvzip(attachments,sha256,";-;")
| mvexpand foo
| eval foo=split(foo,";-;") 
| eval attachments=mvindex(foo,0) 
| eval sha256=mvindex(foo,1)
| table attachments sha256

r. Ismo

dtaylor
Path Finder

MV fields are fine. In fact, that's how it extracts when using rex directly. In this case, though, despite using the *exact* same regex, it only extracts the first of the attachments in the dummy data when put in as a proper field using the Field Extractor.

That said, the regex is made by myself. Splunk didn't generate it. I put it in manually using the field extractor.

I'm try to have the fields extracted, because aside from being useful data, I want to use the field in my base search to say something like attachments=* but obviously I can't do that before I extract it with rex......

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The field extractor is a feature which admiteddly looks good and is a "selling feature" - you can show a potential customer that you don't have to be a master of regexes to be able to extract fields from data. And it might be useful if you have a Splunk Free instance at home processing negligible amounts of data and it doesn't matter to you how "pretty" and efficient the resulting extractions are.

But it of course doesn't cover all possible use cases, like your multivalue fields or a tokenizer.

I'd have to double-check but you might be able to reach more advanced settings either via directly editing transforms in the fields extraction section of the configuration menu or via the "all configurations" section.

dtaylor
Path Finder

Gotcha, I'll admit, I hope you're mistaken, and the Fields Extractor can properly extract multivalue fields......I say this because I just use Splunk. Unfortunately, I don't have any access to the actual conf files on the server outside what can be edited in the Web UI.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. It seems that Field Extractor only creates inline extractions. If you want to create transform-based extractions, you need to do them from the Settings menu

Settings -> Fields -> Field transformations - there you can create a new transform with a possibility to check a "create multivalued fields" option

Then you can use the transform created here to create extraction in Settings -> Field -> Field Extractions

dtaylor
Path Finder

You're an absolute genius! Thank you so much. I knew there had to be something I was missing. It was as simple as you say. I made the transform using the regex I already knew worked and then referenced it in a field extraction. ~Worked like a charm.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Ok, now I understand your requirements. You can do it with props.conf & transforms.conf too. 

MV_ADD = <boolean>
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls what the extractor does when it finds a field which
  already exists.
* If set to true, the extractor makes the field a multivalued field and
  appends the newly found value, otherwise the newly found value is
  discarded.
* Default: false

This parameter add multiple values in mv field. 
I haven’t use field extractor so much that I cannot recall if there are any options to do same or not, but I think that this MV_ADD is your solution.

Get Updates on the Splunk Community!

Buttercup Games: Further Dashboarding Techniques

Hello! We are excited to kick off a new series of blogs from SplunkTrust member ITWhisperer, who demonstrates ...

Message Parsing in SOCK

Introduction This blog post is part of an ongoing series on SOCK enablement. In this blog post, I will write ...

Exploring the OpenTelemetry Collector’s Kubernetes annotation-based discovery

We’ve already explored a few topics around observability in a Kubernetes environment -- Common Failures in a ...