Getting Data In

How do I override field extraction from json?

mk1294splunk
Observer

Hi,

I send email data to http event collector in JSON format like this :

{
"sender-domain":"domain.com",
"sender":"sender.test@domain.com",
"recipient":"Name1 Surname1<name1.surname1@domain.com>, "Name2 Surname2<name2.surname2@domain.com>"
}

I would like to extract email addresses from recipient field and save it as multivalue field with the same name (field recipient will be used in email data model).

Do you have any idea what can i do this? 

The only idea which I have is use sedcmd to change name for recipient  to another field name and next use regex to do extraction from this fields email adresses as recipient field.

The Regex is:  

SOURCE_KEY = changed_recipient_field_name
REGEX = (?<recipient>[\w\d\.\-\=\+]+\@[\w\d\.\-]+)
FORMAT = recipient::$1

What is the best solution for this?

Thank you in advance.

 

Labels (3)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Firstly, that's not a well-formed json structure.

Secondly - why do you want to _change_ the field name? You can

0 Karma

mk1294splunk
Observer

Hi, 

I have configured KV_MODE=json and  automatically extraction all fields.

Currently the value of field recipient is: 

Name1 Surname1<name1.surname1@domain.com>, "Name2 Surname2<name2.surname2@domain.com>

I would like to have multivalue field recipient with following values (Field recipient is used by the Email Data Model): 

name1.surname1@domain.com

name2.surname2@domain.com

It is possible to override field value?

If i change the field name to another with sedcmd I can extract email addresses value with regex and create recipient field as I want. 

What is the best solution for this? Create own extraction(Not use json mode)?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No, sedcmd doesn't have anything to do with it.

Remember that sedcmd is a transform applied at ingest time whereas kv_mode works in search-time.

Unfortunately, the order of search-time operations together with the possible address formats makes the task tricky. https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Searchtimeoperationssequence

It'd be relatively easy to create a calculated field to split your recipient field into separate values on comma.

The problem is that you would get all the recipient data, not just email address.

Unfortunately, you can't perform any more modifications on the calculated fields and separate calculated fields are performed in parallel so they are not "chainable".

So your task is relatively complicated because you have many possible formats of data in your recipient field.

You could do a regex-based extraction on the raw field (not parsing it as json) but it's tricky especially if your event can be multiline.

You could try to "normalize" the format of the ingested data on ingest time but it's also a bit complicated since SEDcmd is relatively simple in its funcitonality whereas you need a decent level of format validation.

It's of course all extractable by performing a proper search but that's not what you want if you want to normalize the fields for datamodel.

It's a relatively tricky task and requires some careful development and checking your border cases. It might be worth consulting your splunk partner or professional services for this one.

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...