Solved: Splunk Add-on for Amazon Web Services: Why are eve...

cfactaylor · ‎05-24-2016

I am running the Splunk Add-On for AWS, now at version 4.0.0 as of tonight. I'm mostly interested in CloudWatch Logs events. I understand that each input has a polling interval. I've set my interval to 60 seconds for a sample log group. When I run a search with that log group as the source for a 5-minute window, the initial results come from the indexed events, usually 30-45 seconds old. No new events stream in to my search. After 5 minutes it is totally empty. If I refresh it, it shows a set of events that should have qualified for the real-time search, and they age out again.

I can run real-time searches against other sources, so I don't see it being an issue of insufficient permissions for my role. I can't find any documentation that indicates these sources wouldn't be visible to real-time searches. Am I doing something wrong, or is this a limitation of the add-on's design?

cfactaylor · ‎11-28-2016

We now use the Splunk HTTP Event Collector (HEC). Even a single-node HEC configuration can ingest more log detail than the rate-limited Splunk Add-On for AWS. Events coming in via the HEC are also visible in real-time queries. Ultimately we configured an auto-scaling group of HEC nodes in our AWS account. Our event pipeline starts with a CloudWatch Logs subscription that sends each log group of interest to a single Kinesis Stream. From there an AWS Lambda (in Python) consumes the Kinesis stream and passes events to the HEC.

We've had to manually bump up our Kinesis Stream shard count for our single stream to handle some volume spikes. We will probably switch over to Kinesis Firehose eventually, which will handle autoscaling.

View solution in original post

cfactaylor · ‎11-28-2016

We now use the Splunk HTTP Event Collector (HEC). Even a single-node HEC configuration can ingest more log detail than the rate-limited Splunk Add-On for AWS. Events coming in via the HEC are also visible in real-time queries. Ultimately we configured an auto-scaling group of HEC nodes in our AWS account. Our event pipeline starts with a CloudWatch Logs subscription that sends each log group of interest to a single Kinesis Stream. From there an AWS Lambda (in Python) consumes the Kinesis stream and passes events to the HEC.

We've had to manually bump up our Kinesis Stream shard count for our single stream to handle some volume spikes. We will probably switch over to Kinesis Firehose eventually, which will handle autoscaling.

williamholder · ‎11-22-2016

I've noticed the same thing, and I can't find any information pointing me to a reason for it.
I did manually play with some of the time related fields in the plugin config and nothing seemed to affect the delay.

I'd be curious to find out why this is happening and how to fix it (if it can be fixed) otherwise this plugin may be useless to us.

cfactaylor · ‎11-28-2016

We never got an answer to the "why" question. We also uncovered another limitation: this version of the Add-On uses a method for ingesting CloudWatch Logs that is constrained by a hard rate limit by AWS. AWS limits each account to 10 CWL log detail requests per second through the CWL API, each of which can return no more than 1MB of data. That is a 10MB/s aggregate limit on CWL events through the add-on. Both of these limits were significant to us, and we ultimately abandoned the Add-On.

Splunk Add-on for Amazon Web Services: Why are events from CloudWatch Log inputs not streaming into a real-time search

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?

Splunk Add-on for Amazon Web Services: Why are events from CloudWatch Log inputs not streaming into a real-time search

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard