The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.
Send data directly into the indexing tier using Splunk HTTP Event Collector (HEC) and AWS Lambda. With AWS Lambda, you can push events from AWS with an AWS services trigger and run code to provision or manage servers with continuous scaling.
Executives might prefer using Lambda to get data into Splunk due to the lower cost. However, system admins can appreciate less operational complexity like low friction, low latency, and scalability. Also, users gain access to a library of blueprints and use cases. After the data is in Splunk, they can download and configure the Splunk App for AWS to access advanced dashboards and sophisticated traffic and security analysis for VPC Flow Logs.
Using AWS Lambda
AWS Lambda is equivalent to on-demand, ephemeral compute in a legacy data center. There are no servers to manage, plus it provides continuous scaling and sub-second metering. AWS Lambda also supports code written in Node.js, Python, Java (Java 8 compatible C# (.NET Core), and can include existing libraries and native.
Amazon CloudWatch service monitors and manages operational data, including Virtual Private Cloud (VPC) flow logs. In a Splunk deployment, use this service to complete the following activities:
Populate the Splunk for AWS App dashboard for Topology, VPC Flow Logs – Traffic Analysis, VPC Flow Logs – Security, and Analysis.
Collect logs from CloudWatch agents or AWS services where you can't use universal forwarders such as CloudWatch Agents (OS Logs), VPC Flow Logs, ECS, WAF Logs
Use the VPC Flow Log feature to monitor IP traffic going to and from network interfaces in your VPC.
Best practice: Use Lambda to get data into Splunk if you have a cloud deployment or your environment has a high-volume of event-based data collection that doesn't need event acknowledgment in Splunk. Read the Getting Data Into (GDI) Splunk From AWS white paper to compare AWS Lambda to other serverless, push models.
Best practice: Use Lambda to get data into Splunk for the following data sources: Amazon GuardDuty, Amazon Macie, sources in Amazon CloudWatch Events, Amazon Elastic Load Balance and (ELB), and Amazon Application Load Balancer (ALB) logs.
Best practice: Use a serverless, push model such as AWS Lambda to collect Amazon CloudWatch Logs and Amazon VPC flow logs from AWS. Using modular, pull model can cause AWS to rate limit a customer for accessing the API too frequently and stop collecting data through API.
Before using Lambda to get data into Splunk, you need a basic understanding of Node.js and to consider the following items:
Splunk doesn't acknowledge events
Lambda can't handle some custom data types or non-AWS native events
Lambda might drop events if there is a failure between Splunk and AWS
If there is a failure between Splunk and AWS, you can collect the dropped events with an Amazon S3 bucket or an Amazon CloudWatch group. You can also set up a heavy forwarder to pull events later to recover the dropped events.
Set up and configure
To complete the following procedure, you need baseline experience with AWS and Splunk and full admin rights for the AWS console and Splunk deployment.
Install the Splunk Add-on for Amazon Web Services.
Create an event collector token with the following specifications to send to the AWS admin:
Name: aws:vpcflow
Source Type: aws:cloudwatchlogs:vpcflow
Enable SSL: Select this option
HTTP Port Number: Use 443 for Splunk Cloud. Use 8088 for Splunk Enterprise.
See Set up and use HTTP Event Collector in Splunk Web in the Getting Data In manual for more details.
Note: From the AWS Console, the AWS admin must use the splunk-cloudwatch-logs-processor blueprint to create a new Lambda function with the following environment variables to send the VPC Flow Logs to Splunk using the HEC and the token:
SPLUNK_HEC_URL – https://<ip>:8088/services/collector
SPLUNK_HEC_TOKEN
Configure a CloudWatch Logs input using Splunk Web.
Use the following search to verify you can see the VPC Flow Log events in Splunk.
index=main sourcetype=aws:cloudwatchlogs:vpcflow
Configure an SQS-based S3 input using Splunk Web for the Splunk Add-on for AWS. Configure an input for each of the following data types: CloudTrail, CloudFront Access Log, Config, ELB Access Logs, S3 Access Logs, and custom data types.
Verify and troubleshoot
Ensure the HEC port, Load Balancer or endpoints can accept data.
To verify the HEC, open a command prompt and type one of the following cURL statements:
Splunk Enterprise: bash$ curl -k https://<host>:8088/services/collector -H 'Authorization: Splunk <token>' -d '{"sourcetype": "mysourcetype", "event":"Hello, World!"}’
Splunk Cloud: bash$ curl -k https://http-inputs-.splunkcloud.com:443/services/collector -H 'Authorization: Splunk <token>' -d '{"event":"Hello, World!"
Test a function to make sure that Splunk is sending data to the AWS deployment.
To test, select a function to test, select the template for CloudWatch Logs, then run the test.
Note: If the test fails, see the CloudWatch logs for the AWS Lambda function to see where the failure occurred. A common issue is a wrong URL for Splunk. Make sure you are sending to an HTTPS secure endpoint secure; unsecured endpoints are generally not set up by default.
Refer to the Troubleshooting section in the How to stream AWS CloudWatch Logs to Splunk (Hint: it’s easier than you think) Splunk Blog.
See the following resources from Splunk for more details:
See VPC Flow Logs on YouTube
Use AWS Lambda with HTTP Event Collector
Create a Lambda function using the splunk-logging blueprint (Node.js)
... View more