Splunk Search

How to index only the data payload from json so that it will be recognized as a standard access_combined_wcookie sourcetype?

hnelsonit
Explorer

I have my apache access logs going to cloudwatchlogs in aws. I used to use the aws addon TA for splunk to collect the events from aws cloudwatchlogs using the built in api calls. Due to api limits I switched to a kinesis stream and lambda function to send the events to the splunk http event collector. The data now comes in a json payload which looks like the following:

{"message": "12.156.22.149 - - [09/Dec/2016:20:20:44 -0500] \"-\" 408 - \"-\" \"-\" \"-\"", "aws_account_name": "anon-op-prod"}

Splunk extracts the fields message and aws_account_name, but obviously this will not be recognized by splunk as access_combined_wcookie because it cannot extract the fields. My thought is to drop the json object names before indexing because I don't care about them, only the data.

I thought this would work but maybe I am misunderstanding how my regex is being handled in transforms.

My config is as follows:

Transforms.conf:

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = "message": "(.*")"
DEST_KEY = _raw
FORMAT = $1

Props.conf:

[access_combined_wcookie]
TRANSFORMS-ACCESS = setnull, setparsing

This doesn't seem to be working. The idea was to replace _raw with whatever matched in the regex grouping to be indexed. I don't get anything though. Do I have to have another step to send it back to the queue or is the regex flawed? Any help would be appreciated.

0 Karma
1 Solution

hnelsonit
Explorer

Sorry I meant to reply to this a while back with what I got working. Below is my transforms, similar to yours above:

[setparsing]
REGEX = \"message\": \"(.*\")\"
DEST_KEY= _raw
FORMAT = $1

[ApacheAccessLogs]
SEDCMD-removeescape = s/\\//g
TRANSFORMS-ACCESS = setparsing, Apache_Access

The one thing missing from the answer above is that since the original _raw message had escaped quotes for the json, you have to strip out the escape characters. I used the SEDCMD to accomplish this. You can ignore the Apache_Access in the TRANSFORMS-ACCESS. That is unique to my case as I had to transform the events from one sourcetype to the standard sourcetype of access_combined_wcookie due to configs outside of my control.

View solution in original post

0 Karma

hnelsonit
Explorer

Sorry I meant to reply to this a while back with what I got working. Below is my transforms, similar to yours above:

[setparsing]
REGEX = \"message\": \"(.*\")\"
DEST_KEY= _raw
FORMAT = $1

[ApacheAccessLogs]
SEDCMD-removeescape = s/\\//g
TRANSFORMS-ACCESS = setparsing, Apache_Access

The one thing missing from the answer above is that since the original _raw message had escaped quotes for the json, you have to strip out the escape characters. I used the SEDCMD to accomplish this. You can ignore the Apache_Access in the TRANSFORMS-ACCESS. That is unique to my case as I had to transform the events from one sourcetype to the standard sourcetype of access_combined_wcookie due to configs outside of my control.

0 Karma

malvidin
Communicator

The sed command is probably too aggressive, as it removes escaped backslashes. The regular expression to extract the message didn't include all the characters. By using an alternative of matching every escaped quote or other characters, it will keep any single JSON string. 

REGEX = \"message\":\s*\"((?:\\\"|[^\"]*)\"

SEDCMD-unescapetab = s/\\t/    /g

SEDCMD-unescapenewline = s/\\n/\n/g

SEDCMD-removeescape = s/\\(.)/\1/g

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please accept an answer.

---
If this reply helps you, Karma would be appreciated.
0 Karma

seegeekrun
Path Finder

@hnelsonit, Did you end up still using the setnull prop as well? Or did the your solution effectively only index the parsed field?

0 Karma

hnelsonit
Explorer

No, the regex in [setparsing] does all the work in that it is only sending what is in the capture group to _raw

0 Karma

somesoni2
SplunkTrust
SplunkTrust

I will try like this. Your current configurations looks to be derived from configuration of Event Filtering, which I don't you need.

Transforms.conf:

[convert_access]
REGEX = ^\{\"message\":\s*\"(.+)\",\s*\"aws_account_name.+$
DEST_KEY = _raw
FORMAT = $1

Props.conf:

 [access_combined_wcookie]
 TRANSFORMS-ACCESS = convert_access
0 Karma

pgreer_splunk
Splunk Employee
Splunk Employee

Your regex does match (for your example) just the log message:

12.156.22.149 - - [09/Dec/2016:20:20:44 -0500] \"-\" 408 - \"-\" \"-\" \"-\"

as tested at https://regex101.com/ So the regex seems valid.

Maybe it's the transform format. From the documentation https://docs.splunk.com/Documentation/Splunk/6.5.1/Admin/Transformsconf

it says in the regex section:

  • If the REGEX extracts both the field name and its corresponding field value, you can use the following special capturing groups if you want to skip specifying the mapping in FORMAT: KEY, VAL.
  • For example, the following are equivalent:
    • Using FORMAT: * REGEX = ([a-z]+)=([a-z]+) * FORMAT = $1::$2
    • Without using FORMAT * REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

So maybe the regex should be?:

[setparsing]
REGEX = "message": "(?<_raw>.*")"

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...