We’re excited to announce a powerful update to Splunk Data Management with added support for Amazon Data Firehose in Edge Processor! This enhancement enables you to use Amazon Data Firehose (formerly Amazon Kinesis Data Firehose) as a data source, offering greater flexibility and efficiency in managing data streams. With integration across over 20 AWS services, you now can easily stream data into Splunk from sources like Amazon CloudWatch, SNS, AWS WAF, Network Firewall, IoT, and more.
With this update, Edge Processor can now directly ingest logs from Amazon Data Firehose, enabling seamless streaming from various AWS services into Splunk for real-time analysis and visualization. Whether monitoring cloud infrastructure, applications, or security events, this addition broadens your data source options, enhances your ability to gain real-time insights, and simplifies data pipeline management while both reducing latency and ensuring faster access to critical data.
This release also introduces another crucial feature in Edge Processor: receiver acknowledgement for upstream HTTP Event Collector (HEC) data. This preserves data integrity by ensuring HEC events sent to the processor are properly received and acknowledged, adding an additional layer of confidence that no information is lost during transmission between data inputs and Edge Processors.
In the following sections, we’ll guide you through how to integrate Amazon Data Firehose into your existing Splunk setup. Specifically, we’ll focus on setting up a HEC token for your Edge Processor, configuring VPC flow log ingestion into Splunk via Amazon Data Firehose, and achieving network traffic CIM compliance using SPL2 pipelines. An architectural diagram illustrating the high-level components involved in this setup can be seen below.
You can also view this step-by-step guide in Lantern.
Note: The following steps assume you already have access to the following: an Edge Processor tenant with a paired EC stack, an Edge Processor instance running on a machine with an accessible URL, and an AWS account. Furthermore, to ensure proper data ingestion, your Edge Processors’ HEC receivers should accept data over TLS—not mTLS. This can be configured in your tenant’s web UI.
HEC tokens are used by the HTTP Event Collector to authenticate and authorize data sent to Splunk. These tokens securely manage data intake from various sources over HTTP/HTTPS, ensuring that only authorized data is accepted and properly categorized for analysis. Fortunately, the process of generating and setting up a token for use within your Edge Processor is relatively straightforward:
Now that a valid HEC token has been generated, it’s time to apply it to your Edge Processor:
VPC flow logs capture essential information about the IP traffic to and from network interfaces in your Virtual Private Cloud. By streaming these logs through Amazon Data Firehose, you can efficiently route the data to Edge Processor for real-time processing and analysis, enabling deeper insights within your Splunk environment. To set this up, you’ll first need to create a Firehose stream:
To test whether you’ve configured everything correctly before moving on, navigate to your newly-created Firehose stream and expand the panel titled “Test with demo data”. Upon clicking the “Start sending demo data” button, dummy data should be routed from your Firehose stream through your Edge Processor instance. To verify this is working as expected, select the “Edge Processors” tab on the left-hand side of your tenant’s UI and double-click the row containing your Edge Processor. Within a minute or two, the “Data flowing through in the last 30 minutes” metrics in the bottom-right corner of the page should reflect some small amount of inbound data—likely categorized by the default source and sourcetype values specified previously. If this isn’t the case, be sure to check your Firehose stream’s destination error logs in Amazon CloudWatch.
With the Firehose stream now configured to send data to your Edge Processor instance, the final step is to create a VPC log flow and direct it to the Firehose stream:
At this point, you should begin to see VPC flow logs populating the destination specified by your Edge Processor. If routing to Splunk Cloud Platform, you can identify these logs by searching for the default source and sourcetype values defined previously. Again, in the event something has gone wrong, checking the Firehose stream’s destination error logs is a great starting point for debugging.
With VPC flow logs now successfully ingested into Edge Processor, the next step is to transform these logs to align with the CIM Network Traffic data model. By leveraging specific SPL2 commands, we can build and apply a pipeline that maps the flow log fields to their CIM equivalents. This will ensure the data is normalized, enabling consistent and effective analysis across Splunk’s search and reporting capabilities. To accomplish this, we must first create a SPL2 pipeline:
Since a new pipeline has been created, we can now use various SPL2 commands to extract information from the flow log and map it to CIM-compliant field names. For AWS flow logs specifically, the default record format—referenced in Step 6 of the previous section—is of the form: ${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}. According to Splunk’s field mapping documentation, the following changes will need to be made in order to achieve CIM compliance:
The next step involves implementing these changes in code. Notably, the rex command can be used to parse the raw flow log, extracting only fields that are essential for compliance. Fields like version, action, and log-status—which are not required—should be intentionally excluded from this extraction process, ensuring that only necessary information is retained. Additionally, the pipeline should calculate the duration of the network session using the provided start and end timestamps in order to align with the data model specified by the CIM. Finally, the fields command can help remove the start and end fields from the log, as they are not needed after calculating duration and can thus be ignored. Here’s an example of what the resulting SPL2 may look like:
$pipeline = | from $source
| rex field=_raw /{"message":"\S+ (?P<vendor_account>\S+) (?P<dvc>\S+) (?P<src_ip>\S+) (?P<dest_ip>\S+) (?P<src_port>\S+) (?P<dest_port>\S+) (?P<transport>\S+) (?P<packets>\S+) (?P<bytes>\S+) (?P<start>\S+) (?P<end>\S+) \S+ \S+"}/
| eval duration = end - start
| fields - start, end
| into $destination;
Now that all the data transformation logic is in place, the only remaining step is to save the pipeline and apply it to your running Edge Processor:
Logs routed to your specified destination should now contain the CIM-compliant fields appended above.
With the introduction of Amazon Data Firehose support in Edge Processor, managing and analyzing your AWS data streams has never been easier. This update not only expands your data source options but also enhances the reliability of data transmission with receiver acknowledgement for upstream HEC data. Whether you’re monitoring cloud infrastructure, analyzing security events, or ensuring CIM compliance, these new capabilities provide you with the tools needed to optimize your Splunk environment. We encourage you to explore these features and see how they can enhance your data processing workflows.
To get started with one (or both!) of our Data Management pipeline builders, fill out the following form. For more Edge Processor resources, check out the Data Management Resource Hub. If you’d like to request a feature or provide any other feedback, we strongly encourage you to create a Splunk Idea and/or send an email to edgeprocessor@splunk.com. You can also join the lively discussion in the #edge-processor channel of the splunk-usergroups workspace in Slack. It’s an excellent forum to learn from the community on the latest Edge Processor use-cases.
Happy Splunking!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.