All Apps and Add-ons

Amazon Kinesis Firehose to Splunk with cloudwatch data - does anyone have a blueprint that changes the source?

gjanders
SplunkTrust
SplunkTrust

After reading various blog posts such as this one and the AWS kinesis firehose application documentation we eventually determined how to get data into Splunk from AWS kinesis firehose.

Our newest issue is that in the AWS config the Cloudwatch -> Log Groups -> Streams have various AWS streams setup that then send into Kinesis firehose and finally into Splunk
This is technically working, however there is no way to determine which "stream" send the data to Splunk

In terms of AWS lambda blueprint we are using the Kinesis Firehose Cloudwatch Logs Processor, we also tested the Kinesis Firehose Process Record Streams as source option but that didn't get any data in.
The first blueprint works great but the source field in Splunk is always the same and the rawdata doesn't include the stream the data came from.
Setting up separate config for different Amazon streams would be a nightmare to maintain so we need to solve this in the AWS config somewhere...

Has anyone solved this issue?
I suspect we need to create a new AWS lambda blueprint based on Kinesis Firehose Cloudwatch Logs Processor with the ability to customise the source field when sending to the HEC port, but if there is an easier way or someone has already done this that would be even better!

0 Karma
1 Solution

gjanders
SplunkTrust
SplunkTrust

I don't have the lambda code but I believe it was straightforward, if anyone really needs it I can ask the developer...
Effectively a new blueprint was created that includes the log_group= and the log_source= fields into the raw data (in the format [log_group name] log_source name.

Note there was an extra space at the start of the log files, that could probably be fixed in the lambda code but I've handled it in the regex, in this example I override the source field in Splunk with log_group/log_stream names.

props.conf

#After using the log_stream=<info>, source becomes the info part of the regex
#and then we use sedcmd to remove the raw data to make it look cleaner in Splunk
[aws:firehose:cloudwatchevents]
TRANSFORMS-changeFirehoseSource = firehoseSourceOverride
TRANSFORMS-removeFirehoseLogStream = firehoseSourceRemoval

transforms.conf

#Amazon kinesis firehose does not currently set a source so override it based on the customised raw data field
#which looks like
# [log_group=nonprod-logs] [log_stream=docker/nonprod-platforms/i-0af36bb54f276debc] time=...
[firehoseSourceOverride]
REGEX = ^\s*\[log_group=([^\]]+)\]\s+\[log_stream=([^\]]+)
DEST_KEY = MetaData:Source
FORMAT = source::$1/$2

[firehoseSourceRemoval]
REGEX = ^\s*\[log_group=[^\]]+\]\s+\[log_stream=[^\]]+\]\s(.*)
FORMAT = $1
DEST_KEY = _raw

View solution in original post

0 Karma

gjanders
SplunkTrust
SplunkTrust

I don't have the lambda code but I believe it was straightforward, if anyone really needs it I can ask the developer...
Effectively a new blueprint was created that includes the log_group= and the log_source= fields into the raw data (in the format [log_group name] log_source name.

Note there was an extra space at the start of the log files, that could probably be fixed in the lambda code but I've handled it in the regex, in this example I override the source field in Splunk with log_group/log_stream names.

props.conf

#After using the log_stream=<info>, source becomes the info part of the regex
#and then we use sedcmd to remove the raw data to make it look cleaner in Splunk
[aws:firehose:cloudwatchevents]
TRANSFORMS-changeFirehoseSource = firehoseSourceOverride
TRANSFORMS-removeFirehoseLogStream = firehoseSourceRemoval

transforms.conf

#Amazon kinesis firehose does not currently set a source so override it based on the customised raw data field
#which looks like
# [log_group=nonprod-logs] [log_stream=docker/nonprod-platforms/i-0af36bb54f276debc] time=...
[firehoseSourceOverride]
REGEX = ^\s*\[log_group=([^\]]+)\]\s+\[log_stream=([^\]]+)
DEST_KEY = MetaData:Source
FORMAT = source::$1/$2

[firehoseSourceRemoval]
REGEX = ^\s*\[log_group=[^\]]+\]\s+\[log_stream=[^\]]+\]\s(.*)
FORMAT = $1
DEST_KEY = _raw
0 Karma

cweatherall
Engager

I would like the lambda code as well, I'm running into the same situation. It's hard for me to transition away from Kinesis to Kinesis Firehose as recommended, when the source data is changed to the collector token name or the source in the collector token. My users are accustomed to their logs containing the source based on the log group name. We have log groups subscribing to a single Kinesis stream today in accounts/regions. That would be preserved to a certain degree when transitioning to Firehose.

0 Karma

cweatherall
Engager

I found this and it was helpful for me to get the log group and stream information:

https://www.splunk.com/blog/2019/02/21/how-to-ingest-any-log-from-aws-cloudwatch-logs-via-firehose.h...

0 Karma

akira_splunk
Splunk Employee
Splunk Employee

Can you please share the lambda function?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...