All Apps and Add-ons

Where can I find an overview of getting data into Splunk using AWS Lambda?

Splunk Employee
Splunk Employee

I've heard that using AWS Lambda is a great way to get high volumes of data directly into Splunk without the overhead managing hardware. It seems like a great solution, can you provide an overview to help me get started?

0 Karma

Re: Where can I find an overview of getting data into Splunk using AWS Lambda?

Splunk Employee
Splunk Employee

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

Send data directly into the indexing tier using Splunk HTTP Event Collector (HEC) and AWS Lambda. With AWS Lambda, you can push events from AWS with an AWS services trigger and run code to provision or manage servers with continuous scaling.

Executives might prefer using Lambda to get data into Splunk due to the lower cost. However, system admins can appreciate less operational complexity like low friction, low latency, and scalability. Also, users gain access to a library of blueprints and use cases. After the data is in Splunk, they can download and configure the Splunk App for AWS to access advanced dashboards and sophisticated traffic and security analysis for VPC Flow Logs.

Using AWS Lambda

AWS Lambda is equivalent to on-demand, ephemeral compute in a legacy data center. There are no servers to manage, plus it provides continuous scaling and sub-second metering. AWS Lambda also supports code written in Node.js, Python, Java (Java 8 compatible C# (.NET Core), and can include existing libraries and native.

Amazon CloudWatch service monitors and manages operational data, including Virtual Private Cloud (VPC) flow logs. In a Splunk deployment, use this service to complete the following activities:

  • Populate the Splunk for AWS App dashboard for Topology, VPC Flow Logs – Traffic Analysis, VPC Flow Logs – Security, and Analysis.
  • Collect logs from CloudWatch agents or AWS services where you can't use universal forwarders such as CloudWatch Agents (OS Logs), VPC Flow Logs, ECS, WAF Logs
  • Use the VPC Flow Log feature to monitor IP traffic going to and from network interfaces in your VPC.

Best practice: Use Lambda to get data into Splunk if you have a cloud deployment or your environment has a high-volume of event-based data collection that doesn't need event acknowledgment in Splunk. Read the Getting Data Into (GDI) Splunk From AWS white paper to compare AWS Lambda to other serverless, push models.

Best practice: Use Lambda to get data into Splunk for the following data sources: Amazon GuardDuty, Amazon Macie, sources in Amazon CloudWatch Events, Amazon Elastic Load Balance and (ELB), and Amazon Application Load Balancer (ALB) logs.

Best practice: Use a serverless, push model such as AWS Lambda to collect Amazon CloudWatch Logs and Amazon VPC flow logs from AWS. Using modular, pull model can cause AWS to rate limit a customer for accessing the API too frequently and stop collecting data through API.

Before using Lambda to get data into Splunk, you need a basic understanding of Node.js and to consider the following items:

  • Splunk doesn't acknowledge events
  • Lambda can't handle some custom data types or non-AWS native events
  • Lambda might drop events if there is a failure between Splunk and AWS

If there is a failure between Splunk and AWS, you can collect the dropped events with an Amazon S3 bucket or an Amazon CloudWatch group. You can also set up a heavy forwarder to pull events later to recover the dropped events.

Set up and configure

To complete the following procedure, you need baseline experience with AWS and Splunk and full admin rights for the AWS console and Splunk deployment.

  1. Install the Splunk Add-on for Amazon Web Services.
  2. Create an event collector token with the following specifications to send to the AWS admin: Name: aws:vpcflow Source Type: aws:cloudwatchlogs:vpcflow Enable SSL: Select this option HTTP Port Number: Use 443 for Splunk Cloud. Use 8088 for Splunk Enterprise. See Set up and use HTTP Event Collector in Splunk Web in the Getting Data In manual for more details. Note: From the AWS Console, the AWS admin must use the splunk-cloudwatch-logs-processor blueprint to create a new Lambda function with the following environment variables to send the VPC Flow Logs to Splunk using the HEC and the token:
    SPLUNKHECURL – https://<ip>:8088/services/collector
  3. Configure a CloudWatch Logs input using Splunk Web.
  4. Use the following search to verify you can see the VPC Flow Log events in Splunk. index=main sourcetype=aws:cloudwatchlogs:vpcflow
  5. Configure an SQS-based S3 input using Splunk Web for the Splunk Add-on for AWS. Configure an input for each of the following data types: CloudTrail, CloudFront Access Log, Config, ELB Access Logs, S3 Access Logs, and custom data types.

Verify and troubleshoot

  1. Ensure the HEC port, Load Balancer or endpoints can accept data. To verify the HEC, open a command prompt and type one of the following cURL statements: Splunk Enterprise: bash$ curl -k https://<host>:8088/services/collector -H 'Authorization: Splunk <token>' -d '{"sourcetype": "mysourcetype", "event":"Hello, World!"}’ Splunk Cloud: bash$ curl -k -H 'Authorization: Splunk <token>' -d '{"event":"Hello, World!"
  2. Test a function to make sure that Splunk is sending data to the AWS deployment.
    To test, select a function to test, select the template for CloudWatch Logs, then run the test.
    Note: If the test fails, see the CloudWatch logs for the AWS Lambda function to see where the failure occurred. A common issue is a wrong URL for Splunk. Make sure you are sending to an HTTPS secure endpoint secure; unsecured endpoints are generally not set up by default.

  3. Refer to the Troubleshooting section in the How to stream AWS CloudWatch Logs to Splunk (Hint: it’s easier than you think) Splunk Blog.

See the following resources from Splunk for more details:

View solution in original post

0 Karma