I have my apache access logs going to cloudwatchlogs in aws. I used to use the aws addon TA for splunk to collect the events from aws cloudwatchlogs using the built in api calls. Due to api limits I switched to a kinesis stream and lambda function to send the events to the splunk http event collector. The data now comes in a json payload which looks like the following:
{"message": "12.156.22.149 - - [09/Dec/2016:20:20:44 -0500] \"-\" 408 - \"-\" \"-\" \"-\"", "aws_account_name": "anon-op-prod"}
Splunk extracts the fields message and aws_account_name, but obviously this will not be recognized by splunk as access_combined_wcookie because it cannot extract the fields. My thought is to drop the json object names before indexing because I don't care about them, only the data.
I thought this would work but maybe I am misunderstanding how my regex is being handled in transforms.
My config is as follows:
Transforms.conf:
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[setparsing]
REGEX = "message": "(.*")"
DEST_KEY = _raw
FORMAT = $1
Props.conf:
[access_combined_wcookie]
TRANSFORMS-ACCESS = setnull, setparsing
This doesn't seem to be working. The idea was to replace _raw with whatever matched in the regex grouping to be indexed. I don't get anything though. Do I have to have another step to send it back to the queue or is the regex flawed? Any help would be appreciated.
Sorry I meant to reply to this a while back with what I got working. Below is my transforms, similar to yours above:
[setparsing]
REGEX = \"message\": \"(.*\")\"
DEST_KEY= _raw
FORMAT = $1
[ApacheAccessLogs]
SEDCMD-removeescape = s/\\//g
TRANSFORMS-ACCESS = setparsing, Apache_Access
The one thing missing from the answer above is that since the original _raw message had escaped quotes for the json, you have to strip out the escape characters. I used the SEDCMD to accomplish this. You can ignore the Apache_Access in the TRANSFORMS-ACCESS. That is unique to my case as I had to transform the events from one sourcetype to the standard sourcetype of access_combined_wcookie due to configs outside of my control.
Sorry I meant to reply to this a while back with what I got working. Below is my transforms, similar to yours above:
[setparsing]
REGEX = \"message\": \"(.*\")\"
DEST_KEY= _raw
FORMAT = $1
[ApacheAccessLogs]
SEDCMD-removeescape = s/\\//g
TRANSFORMS-ACCESS = setparsing, Apache_Access
The one thing missing from the answer above is that since the original _raw message had escaped quotes for the json, you have to strip out the escape characters. I used the SEDCMD to accomplish this. You can ignore the Apache_Access in the TRANSFORMS-ACCESS. That is unique to my case as I had to transform the events from one sourcetype to the standard sourcetype of access_combined_wcookie due to configs outside of my control.
The sed command is probably too aggressive, as it removes escaped backslashes. The regular expression to extract the message didn't include all the characters. By using an alternative of matching every escaped quote or other characters, it will keep any single JSON string.
REGEX = \"message\":\s*\"((?:\\\"|[^\"]*)\"
SEDCMD-unescapetab = s/\\t/ /g
SEDCMD-unescapenewline = s/\\n/\n/g
SEDCMD-removeescape = s/\\(.)/\1/g
Please accept an answer.
@hnelsonit, Did you end up still using the setnull prop as well? Or did the your solution effectively only index the parsed field?
No, the regex in [setparsing] does all the work in that it is only sending what is in the capture group to _raw
I will try like this. Your current configurations looks to be derived from configuration of Event Filtering, which I don't you need.
Transforms.conf:
[convert_access]
REGEX = ^\{\"message\":\s*\"(.+)\",\s*\"aws_account_name.+$
DEST_KEY = _raw
FORMAT = $1
Props.conf:
[access_combined_wcookie]
TRANSFORMS-ACCESS = convert_access
Your regex does match (for your example) just the log message:
12.156.22.149 - - [09/Dec/2016:20:20:44 -0500] \"-\" 408 - \"-\" \"-\" \"-\"
as tested at https://regex101.com/ So the regex seems valid.
Maybe it's the transform format. From the documentation https://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/Transformsconf
it says in the regex section:
So maybe the regex should be?:
[setparsing]
REGEX = "message": "(?<_raw>.*")"