Splunk Search
Highlighted

How to edit my configurations to extract a multivalue field from an extracted field?

Communicator

I am trying to extract fields for OpenDNS logs.
These come in a CSV format:

  "2015-01-01 20:39:57","client1","client1,site1","1.1.1.1","2.2.2.2","Allowed","1 (A)","NOERROR","www.google.com.","Search Engines"

The challenge here is that fields "identities" and "categories" are often multi-valued (also comma-separated).
I went off the idea from here: https://answers.splunk.com/answers/112311/multi-value-field-extraction.html

  1. Extract all the main fields
  2. Do a second transform to extract the multi-values

The first part works fine:

**props.conf:**
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3

**transforms.conf:**
[opendns_aws_s3]
DELIMS = ","
FIELDS = timestamp,granular_id,identities,internal_ip,external_ip,action,query_type,resp_code,domain,categories

But now I have not split "identities" and "categories".
So I added a second transform, to work on the categories field:

**props.conf:**
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3
REPORT-opendns-category = opendns_aws_s3_category

**transforms.conf:**
[opendns_aws_s3_category]
SOURCE_KEY=categories
DELIMS = ","
FIELDS = category
MV_ADD=true

Here I did something wrong, because this isn't working. I get no new field named "category", and the "categories" field is unchanged.
Should I maybe not have added the FIELDS= entry? This was to name the new field. But that was perhaps not a good idea?
How else can I name this as a new field?

0 Karma
Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Esteemed Legend

Try this:

props.conf:

[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3, opendns_aws_s3_category

transforms.conf:

[opendns_aws_s3]
DELIMS = ","
FIELDS = timestamp,granular_id,identities,internal_ip,external_ip,action,query_type,resp_code,domain,categories

[opendns_aws_s3_category]
SOURCE_KEY=categories
REGEX = ([^,]+)(?:,|$)
FORMAT = category::$1
MV_ADD=true

View solution in original post

Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Communicator

Thanks, I'll run that.
I expect regex will do the trick.
I was kinda hoping that since Splunk has a built in mechanism for handling delimited values, that would be the obvious and most efficient choice.

0 Karma
Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Communicator

Problem solved: I found the answer in your post, but in a different part than you might've intended...
I changed the props-conf stanza so that both transforms were on the same line.

That did it!

So, thanks for clearing up my syntax mistake 🙂

0 Karma
Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Esteemed Legend

Yes, otherwise they are process in alphabetical order and your order was wrong ( c comes before f ).

0 Karma
Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Communicator

This was solved by clearing up the props.conf stanza:

This doesn't work:

[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3
REPORT-opendns-category = opendns_aws_s3_category

This works:

[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3, opendns_aws_s3_category 

Thanks to woodcock for the right syntax.

Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Esteemed Legend

Yes, otherwise they are process in alphabetical order and your order was wrong ( c comes before f ).

0 Karma
Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Super Champion

is the "categories" split by doublequotes-comma-doublequotes or just a comma? a more number of examples with multivalues would be great

0 Karma
Highlighted

Re: How to edit my configurations to extract a multivalue field from an extracted field?

Communicator

It's just the comma. Only the original field is enclosed in quotes.

Values vary a lot, some domains fit into 4-5 categories. Actual values may contain spaces and slashes.
Could be stuff like:
"Software/Technology,Business Services" (2 categories)
"Adult Themes,Nudity,Pornography,Sexuality" (4 categories)

(disappointingly, that last one shows up frequently just because we have a monitor running to confirm the filter is in place... sad, I know 😉 )

0 Karma