I've heard that using AWS Lambda is a great way to get high volumes of data directly into Splunk without the overhead managing hardware. It seems like a great solution, can you provide an overview to help me get started?
Send data directly into the indexing tier using Splunk HTTP Event Collector (HEC) and AWS Lambda. With AWS Lambda, you can push events from AWS with an AWS services trigger and run code to provision or manage servers with continuous scaling.
Executives might prefer using Lambda to get data into Splunk due to the lower cost. However, system admins can appreciate less operational complexity like low friction, low latency, and scalability. Also, users gain access to a library of blueprints and use cases. After the data is in Splunk, they can download and configure the Splunk App for AWS to access advanced dashboards and sophisticated traffic and security analysis for VPC Flow Logs.
AWS Lambda is equivalent to on-demand, ephemeral compute in a legacy data center. There are no servers to manage, plus it provides continuous scaling and sub-second metering. AWS Lambda also supports code written in Node.js, Python, Java (Java 8 compatible C# (.NET Core), and can include existing libraries and native.
Amazon CloudWatch service monitors and manages operational data, including Virtual Private Cloud (VPC) flow logs. In a Splunk deployment, use this service to complete the following activities:
Best practice: Use Lambda to get data into Splunk if you have a cloud deployment or your environment has a high-volume of event-based data collection that doesn't need event acknowledgment in Splunk. Read the Getting Data Into (GDI) Splunk From AWS white paper to compare AWS Lambda to other serverless, push models.
Best practice: Use Lambda to get data into Splunk for the following data sources: Amazon GuardDuty, Amazon Macie, sources in Amazon CloudWatch Events, Amazon Elastic Load Balance and (ELB), and Amazon Application Load Balancer (ALB) logs.
Best practice: Use a serverless, push model such as AWS Lambda to collect Amazon CloudWatch Logs and Amazon VPC flow logs from AWS. Using modular, pull model can cause AWS to rate limit a customer for accessing the API too frequently and stop collecting data through API.
Before using Lambda to get data into Splunk, you need a basic understanding of Node.js and to consider the following items:
If there is a failure between Splunk and AWS, you can collect the dropped events with an Amazon S3 bucket or an Amazon CloudWatch group. You can also set up a heavy forwarder to pull events later to recover the dropped events.
To complete the following procedure, you need baseline experience with AWS and Splunk and full admin rights for the AWS console and Splunk deployment.
443
for Splunk Cloud. Use 8088
for Splunk Enterprise.
See Set up and use HTTP Event Collector in Splunk Web in the Getting Data In manual for more details.
Note: From the AWS Console, the AWS admin must use the splunk-cloudwatch-logs-processor blueprint to create a new Lambda function with the following environment variables to send the VPC Flow Logs to Splunk using the HEC and the token:
SPLUNK_HEC_URL –https://<ip>:8088/services/collector
SPLUNK_HEC_TOKEN
index=main sourcetype=aws:cloudwatchlogs:vpcflow
bash$ curl -k https://<host>:8088/services/collector -H 'Authorization: Splunk <token>' -d '{"sourcetype": "mysourcetype", "event":"Hello, World!"}’
Splunk Cloud: bash$ curl -k https://http-inputs-.splunkcloud.com:443/services/collector -H 'Authorization: Splunk <token>' -d '{"event":"Hello, World!"
Test a function to make sure that Splunk is sending data to the AWS deployment.
To test, select a function to test, select the template for CloudWatch Logs, then run the test.
Note: If the test fails, see the CloudWatch logs for the AWS Lambda function to see where the failure occurred. A common issue is a wrong URL for Splunk. Make sure you are sending to an HTTPS secure endpoint secure; unsecured endpoints are generally not set up by default.
Refer to the Troubleshooting section in the How to stream AWS CloudWatch Logs to Splunk (Hint: it’s easier than you think) Splunk Blog.
See the following resources from Splunk for more details:
Send data directly into the indexing tier using Splunk HTTP Event Collector (HEC) and AWS Lambda. With AWS Lambda, you can push events from AWS with an AWS services trigger and run code to provision or manage servers with continuous scaling.
Executives might prefer using Lambda to get data into Splunk due to the lower cost. However, system admins can appreciate less operational complexity like low friction, low latency, and scalability. Also, users gain access to a library of blueprints and use cases. After the data is in Splunk, they can download and configure the Splunk App for AWS to access advanced dashboards and sophisticated traffic and security analysis for VPC Flow Logs.
AWS Lambda is equivalent to on-demand, ephemeral compute in a legacy data center. There are no servers to manage, plus it provides continuous scaling and sub-second metering. AWS Lambda also supports code written in Node.js, Python, Java (Java 8 compatible C# (.NET Core), and can include existing libraries and native.
Amazon CloudWatch service monitors and manages operational data, including Virtual Private Cloud (VPC) flow logs. In a Splunk deployment, use this service to complete the following activities:
Best practice: Use Lambda to get data into Splunk if you have a cloud deployment or your environment has a high-volume of event-based data collection that doesn't need event acknowledgment in Splunk. Read the Getting Data Into (GDI) Splunk From AWS white paper to compare AWS Lambda to other serverless, push models.
Best practice: Use Lambda to get data into Splunk for the following data sources: Amazon GuardDuty, Amazon Macie, sources in Amazon CloudWatch Events, Amazon Elastic Load Balance and (ELB), and Amazon Application Load Balancer (ALB) logs.
Best practice: Use a serverless, push model such as AWS Lambda to collect Amazon CloudWatch Logs and Amazon VPC flow logs from AWS. Using modular, pull model can cause AWS to rate limit a customer for accessing the API too frequently and stop collecting data through API.
Before using Lambda to get data into Splunk, you need a basic understanding of Node.js and to consider the following items:
If there is a failure between Splunk and AWS, you can collect the dropped events with an Amazon S3 bucket or an Amazon CloudWatch group. You can also set up a heavy forwarder to pull events later to recover the dropped events.
To complete the following procedure, you need baseline experience with AWS and Splunk and full admin rights for the AWS console and Splunk deployment.
443
for Splunk Cloud. Use 8088
for Splunk Enterprise.
See Set up and use HTTP Event Collector in Splunk Web in the Getting Data In manual for more details.
Note: From the AWS Console, the AWS admin must use the splunk-cloudwatch-logs-processor blueprint to create a new Lambda function with the following environment variables to send the VPC Flow Logs to Splunk using the HEC and the token:
SPLUNK_HEC_URL –https://<ip>:8088/services/collector
SPLUNK_HEC_TOKEN
index=main sourcetype=aws:cloudwatchlogs:vpcflow
bash$ curl -k https://<host>:8088/services/collector -H 'Authorization: Splunk <token>' -d '{"sourcetype": "mysourcetype", "event":"Hello, World!"}’
Splunk Cloud: bash$ curl -k https://http-inputs-.splunkcloud.com:443/services/collector -H 'Authorization: Splunk <token>' -d '{"event":"Hello, World!"
Test a function to make sure that Splunk is sending data to the AWS deployment.
To test, select a function to test, select the template for CloudWatch Logs, then run the test.
Note: If the test fails, see the CloudWatch logs for the AWS Lambda function to see where the failure occurred. A common issue is a wrong URL for Splunk. Make sure you are sending to an HTTPS secure endpoint secure; unsecured endpoints are generally not set up by default.
Refer to the Troubleshooting section in the How to stream AWS CloudWatch Logs to Splunk (Hint: it’s easier than you think) Splunk Blog.
See the following resources from Splunk for more details: