All Apps and Add-ons

Amazon Kinesis Firehose to Splunk with cloudwatch data - does anyone have a blueprint that changes the source?

SplunkTrust
SplunkTrust

After reading various blog posts such as this one and the AWS kinesis firehose application documentation we eventually determined how to get data into Splunk from AWS kinesis firehose.

Our newest issue is that in the AWS config the Cloudwatch -> Log Groups -> Streams have various AWS streams setup that then send into Kinesis firehose and finally into Splunk
This is technically working, however there is no way to determine which "stream" send the data to Splunk

In terms of AWS lambda blueprint we are using the Kinesis Firehose Cloudwatch Logs Processor, we also tested the Kinesis Firehose Process Record Streams as source option but that didn't get any data in.
The first blueprint works great but the source field in Splunk is always the same and the rawdata doesn't include the stream the data came from.
Setting up separate config for different Amazon streams would be a nightmare to maintain so we need to solve this in the AWS config somewhere...

Has anyone solved this issue?
I suspect we need to create a new AWS lambda blueprint based on Kinesis Firehose Cloudwatch Logs Processor with the ability to customise the source field when sending to the HEC port, but if there is an easier way or someone has already done this that would be even better!

0 Karma
1 Solution

SplunkTrust
SplunkTrust

I don't have the lambda code but I believe it was straightforward, if anyone really needs it I can ask the developer...
Effectively a new blueprint was created that includes the loggroup= and the logsource= fields into the raw data (in the format [log_group name] log_source name.

Note there was an extra space at the start of the log files, that could probably be fixed in the lambda code but I've handled it in the regex, in this example I override the source field in Splunk with loggroup/logstream names.

props.conf

#After using the log_stream=<info>, source becomes the info part of the regex
#and then we use sedcmd to remove the raw data to make it look cleaner in Splunk
[aws:firehose:cloudwatchevents]
TRANSFORMS-changeFirehoseSource = firehoseSourceOverride
TRANSFORMS-removeFirehoseLogStream = firehoseSourceRemoval

transforms.conf

#Amazon kinesis firehose does not currently set a source so override it based on the customised raw data field
#which looks like
# [log_group=nonprod-logs] [log_stream=docker/nonprod-platforms/i-0af36bb54f276debc] time=...
[firehoseSourceOverride]
REGEX = ^\s*\[log_group=([^\]]+)\]\s+\[log_stream=([^\]]+)
DEST_KEY = MetaData:Source
FORMAT = source::$1/$2

[firehoseSourceRemoval]
REGEX = ^\s*\[log_group=[^\]]+\]\s+\[log_stream=[^\]]+\]\s(.*)
FORMAT = $1
DEST_KEY = _raw

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

I don't have the lambda code but I believe it was straightforward, if anyone really needs it I can ask the developer...
Effectively a new blueprint was created that includes the loggroup= and the logsource= fields into the raw data (in the format [log_group name] log_source name.

Note there was an extra space at the start of the log files, that could probably be fixed in the lambda code but I've handled it in the regex, in this example I override the source field in Splunk with loggroup/logstream names.

props.conf

#After using the log_stream=<info>, source becomes the info part of the regex
#and then we use sedcmd to remove the raw data to make it look cleaner in Splunk
[aws:firehose:cloudwatchevents]
TRANSFORMS-changeFirehoseSource = firehoseSourceOverride
TRANSFORMS-removeFirehoseLogStream = firehoseSourceRemoval

transforms.conf

#Amazon kinesis firehose does not currently set a source so override it based on the customised raw data field
#which looks like
# [log_group=nonprod-logs] [log_stream=docker/nonprod-platforms/i-0af36bb54f276debc] time=...
[firehoseSourceOverride]
REGEX = ^\s*\[log_group=([^\]]+)\]\s+\[log_stream=([^\]]+)
DEST_KEY = MetaData:Source
FORMAT = source::$1/$2

[firehoseSourceRemoval]
REGEX = ^\s*\[log_group=[^\]]+\]\s+\[log_stream=[^\]]+\]\s(.*)
FORMAT = $1
DEST_KEY = _raw

View solution in original post

0 Karma

Engager

I would like the lambda code as well, I'm running into the same situation. It's hard for me to transition away from Kinesis to Kinesis Firehose as recommended, when the source data is changed to the collector token name or the source in the collector token. My users are accustomed to their logs containing the source based on the log group name. We have log groups subscribing to a single Kinesis stream today in accounts/regions. That would be preserved to a certain degree when transitioning to Firehose.

0 Karma

Engager

I found this and it was helpful for me to get the log group and stream information:

https://www.splunk.com/blog/2019/02/21/how-to-ingest-any-log-from-aws-cloudwatch-logs-via-firehose.h...

0 Karma

Splunk Employee
Splunk Employee

Can you please share the lambda function?

0 Karma