Splunk Search

How to index only the data payload from json so that it will be recognized as a standard access_combined_wcookie sourcetype?

Explorer

I have my apache access logs going to cloudwatchlogs in aws. I used to use the aws addon TA for splunk to collect the events from aws cloudwatchlogs using the built in api calls. Due to api limits I switched to a kinesis stream and lambda function to send the events to the splunk http event collector. The data now comes in a json payload which looks like the following:

{"message": "12.156.22.149 - - [09/Dec/2016:20:20:44 -0500] \"-\" 408 - \"-\" \"-\" \"-\"", "aws_account_name": "anon-op-prod"}

Splunk extracts the fields message and aws_account_name, but obviously this will not be recognized by splunk as access_combined_wcookie because it cannot extract the fields. My thought is to drop the json object names before indexing because I don't care about them, only the data.

I thought this would work but maybe I am misunderstanding how my regex is being handled in transforms.

My config is as follows:

Transforms.conf:

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = "message": "(.*")"
DEST_KEY = _raw
FORMAT = $1

Props.conf:

[access_combined_wcookie]
TRANSFORMS-ACCESS = setnull, setparsing

This doesn't seem to be working. The idea was to replace _raw with whatever matched in the regex grouping to be indexed. I don't get anything though. Do I have to have another step to send it back to the queue or is the regex flawed? Any help would be appreciated.

0 Karma
1 Solution

Explorer

Sorry I meant to reply to this a while back with what I got working. Below is my transforms, similar to yours above:

[setparsing]
REGEX = \"message\": \"(.*\")\"
DEST_KEY= _raw
FORMAT = $1

[ApacheAccessLogs]
SEDCMD-removeescape = s/\\//g
TRANSFORMS-ACCESS = setparsing, Apache_Access

The one thing missing from the answer above is that since the original _raw message had escaped quotes for the json, you have to strip out the escape characters. I used the SEDCMD to accomplish this. You can ignore the Apache_Access in the TRANSFORMS-ACCESS. That is unique to my case as I had to transform the events from one sourcetype to the standard sourcetype of access_combined_wcookie due to configs outside of my control.

View solution in original post

0 Karma

Explorer

Sorry I meant to reply to this a while back with what I got working. Below is my transforms, similar to yours above:

[setparsing]
REGEX = \"message\": \"(.*\")\"
DEST_KEY= _raw
FORMAT = $1

[ApacheAccessLogs]
SEDCMD-removeescape = s/\\//g
TRANSFORMS-ACCESS = setparsing, Apache_Access

The one thing missing from the answer above is that since the original _raw message had escaped quotes for the json, you have to strip out the escape characters. I used the SEDCMD to accomplish this. You can ignore the Apache_Access in the TRANSFORMS-ACCESS. That is unique to my case as I had to transform the events from one sourcetype to the standard sourcetype of access_combined_wcookie due to configs outside of my control.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Please accept an answer.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Path Finder

@hnelsonit, Did you end up still using the setnull prop as well? Or did the your solution effectively only index the parsed field?

0 Karma

Explorer

No, the regex in [setparsing] does all the work in that it is only sending what is in the capture group to _raw

0 Karma

SplunkTrust
SplunkTrust

I will try like this. Your current configurations looks to be derived from configuration of Event Filtering, which I don't you need.

Transforms.conf:

[convert_access]
REGEX = ^\{\"message\":\s*\"(.+)\",\s*\"aws_account_name.+$
DEST_KEY = _raw
FORMAT = $1

Props.conf:

 [access_combined_wcookie]
 TRANSFORMS-ACCESS = convert_access
0 Karma

Splunk Employee
Splunk Employee

Your regex does match (for your example) just the log message:

12.156.22.149 - - [09/Dec/2016:20:20:44 -0500] \"-\" 408 - \"-\" \"-\" \"-\"

as tested at https://regex101.com/ So the regex seems valid.

Maybe it's the transform format. From the documentation https://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/Transformsconf

it says in the regex section:

  • If the REGEX extracts both the field name and its corresponding field value, you can use the following special capturing groups if you want to skip specifying the mapping in FORMAT: KEY, VAL.
  • For example, the following are equivalent:
    • Using FORMAT: * REGEX = ([a-z]+)=([a-z]+) * FORMAT = $1::$2
    • Without using FORMAT * REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

So maybe the regex should be?:

[setparsing]
REGEX = "message": "(?<_raw>.*")"

0 Karma