I am trying to extract fields for OpenDNS logs.
These come in a CSV format:
"2015-01-01 20:39:57","client1","client1,site1","1.1.1.1","2.2.2.2","Allowed","1 (A)","NOERROR","www.google.com.","Search Engines"
The challenge here is that fields "identities" and "categories" are often multi-valued (also comma-separated).
I went off the idea from here: https://answers.splunk.com/answers/112311/multi-value-field-extraction.html
The first part works fine:
**props.conf:**
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3
**transforms.conf:**
[opendns_aws_s3]
DELIMS = ","
FIELDS = timestamp,granular_id,identities,internal_ip,external_ip,action,query_type,resp_code,domain,categories
But now I have not split "identities" and "categories".
So I added a second transform, to work on the categories field:
**props.conf:**
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3
REPORT-opendns-category = opendns_aws_s3_category
**transforms.conf:**
[opendns_aws_s3_category]
SOURCE_KEY=categories
DELIMS = ","
FIELDS = category
MV_ADD=true
Here I did something wrong, because this isn't working. I get no new field named "category", and the "categories" field is unchanged.
Should I maybe not have added the FIELDS= entry? This was to name the new field. But that was perhaps not a good idea?
How else can I name this as a new field?
Try this:
props.conf:
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3, opendns_aws_s3_category
transforms.conf:
[opendns_aws_s3]
DELIMS = ","
FIELDS = timestamp,granular_id,identities,internal_ip,external_ip,action,query_type,resp_code,domain,categories
[opendns_aws_s3_category]
SOURCE_KEY=categories
REGEX = ([^,]+)(?:,|$)
FORMAT = category::$1
MV_ADD=true
is the "categories" split by doublequotes-comma-doublequotes or just a comma? a more number of examples with multivalues would be great
It's just the comma. Only the original field is enclosed in quotes.
Values vary a lot, some domains fit into 4-5 categories. Actual values may contain spaces and slashes.
Could be stuff like:
"Software/Technology,Business Services" (2 categories)
"Adult Themes,Nudity,Pornography,Sexuality" (4 categories)
(disappointingly, that last one shows up frequently just because we have a monitor running to confirm the filter is in place... sad, I know 😉 )
Try this:
props.conf:
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3, opendns_aws_s3_category
transforms.conf:
[opendns_aws_s3]
DELIMS = ","
FIELDS = timestamp,granular_id,identities,internal_ip,external_ip,action,query_type,resp_code,domain,categories
[opendns_aws_s3_category]
SOURCE_KEY=categories
REGEX = ([^,]+)(?:,|$)
FORMAT = category::$1
MV_ADD=true
This was solved by clearing up the props.conf stanza:
This doesn't work:
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3
REPORT-opendns-category = opendns_aws_s3_category
This works:
[opendns:dnslog]
REPORT-opendns-fields = opendns_aws_s3, opendns_aws_s3_category
Thanks to woodcock for the right syntax.
Yes, otherwise they are process in alphabetical order and your order was wrong ( c
comes before f
).
Thanks, I'll run that.
I expect regex will do the trick.
I was kinda hoping that since Splunk has a built in mechanism for handling delimited values, that would be the obvious and most efficient choice.
Problem solved: I found the answer in your post, but in a different part than you might've intended...
I changed the props-conf stanza so that both transforms were on the same line.
That did it!
So, thanks for clearing up my syntax mistake 🙂
Yes, otherwise they are process in alphabetical order and your order was wrong ( c
comes before f
).