Hi,
I send email data to http event collector in JSON format like this :
{
"sender-domain":"domain.com",
"sender":"sender.test@domain.com",
"recipient":"Name1 Surname1<name1.surname1@domain.com>, "Name2 Surname2<name2.surname2@domain.com>"
}
I would like to extract email addresses from recipient field and save it as multivalue field with the same name (field recipient will be used in email data model).
Do you have any idea what can i do this?
The only idea which I have is use sedcmd to change name for recipient to another field name and next use regex to do extraction from this fields email adresses as recipient field.
The Regex is:
SOURCE_KEY = changed_recipient_field_name
REGEX = (?<recipient>[\w\d\.\-\=\+]+\@[\w\d\.\-]+)
FORMAT = recipient::$1
What is the best solution for this?
Thank you in advance.
Firstly, that's not a well-formed json structure.
Secondly - why do you want to _change_ the field name? You can
Hi,
I have configured KV_MODE=json and automatically extraction all fields.
Currently the value of field recipient is:
Name1 Surname1<name1.surname1@domain.com>, "Name2 Surname2<name2.surname2@domain.com>
I would like to have multivalue field recipient with following values (Field recipient is used by the Email Data Model):
It is possible to override field value?
If i change the field name to another with sedcmd I can extract email addresses value with regex and create recipient field as I want.
What is the best solution for this? Create own extraction(Not use json mode)?
No, sedcmd doesn't have anything to do with it.
Remember that sedcmd is a transform applied at ingest time whereas kv_mode works in search-time.
Unfortunately, the order of search-time operations together with the possible address formats makes the task tricky. https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Searchtimeoperationssequence
It'd be relatively easy to create a calculated field to split your recipient field into separate values on comma.
The problem is that you would get all the recipient data, not just email address.
Unfortunately, you can't perform any more modifications on the calculated fields and separate calculated fields are performed in parallel so they are not "chainable".
So your task is relatively complicated because you have many possible formats of data in your recipient field.
You could do a regex-based extraction on the raw field (not parsing it as json) but it's tricky especially if your event can be multiline.
You could try to "normalize" the format of the ingested data on ingest time but it's also a bit complicated since SEDcmd is relatively simple in its funcitonality whereas you need a decent level of format validation.
It's of course all extractable by performing a proper search but that's not what you want if you want to normalize the fields for datamodel.
It's a relatively tricky task and requires some careful development and checking your border cases. It might be worth consulting your splunk partner or professional services for this one.