I am running the Splunk Add-On for AWS, now at version 4.0.0 as of tonight. I'm mostly interested in CloudWatch Logs events. I understand that each input has a polling interval. I've set my interval to 60 seconds for a sample log group. When I run a search with that log group as the source for a 5-minute window, the initial results come from the indexed events, usually 30-45 seconds old. No new events stream in to my search. After 5 minutes it is totally empty. If I refresh it, it shows a set of events that should have qualified for the real-time search, and they age out again.
I can run real-time searches against other sources, so I don't see it being an issue of insufficient permissions for my role. I can't find any documentation that indicates these sources wouldn't be visible to real-time searches. Am I doing something wrong, or is this a limitation of the add-on's design?
We now use the Splunk HTTP Event Collector (HEC). Even a single-node HEC configuration can ingest more log detail than the rate-limited Splunk Add-On for AWS. Events coming in via the HEC are also visible in real-time queries. Ultimately we configured an auto-scaling group of HEC nodes in our AWS account. Our event pipeline starts with a CloudWatch Logs subscription that sends each log group of interest to a single Kinesis Stream. From there an AWS Lambda (in Python) consumes the Kinesis stream and passes events to the HEC.
We've had to manually bump up our Kinesis Stream shard count for our single stream to handle some volume spikes. We will probably switch over to Kinesis Firehose eventually, which will handle autoscaling.
We now use the Splunk HTTP Event Collector (HEC). Even a single-node HEC configuration can ingest more log detail than the rate-limited Splunk Add-On for AWS. Events coming in via the HEC are also visible in real-time queries. Ultimately we configured an auto-scaling group of HEC nodes in our AWS account. Our event pipeline starts with a CloudWatch Logs subscription that sends each log group of interest to a single Kinesis Stream. From there an AWS Lambda (in Python) consumes the Kinesis stream and passes events to the HEC.
We've had to manually bump up our Kinesis Stream shard count for our single stream to handle some volume spikes. We will probably switch over to Kinesis Firehose eventually, which will handle autoscaling.
I've noticed the same thing, and I can't find any information pointing me to a reason for it.
I did manually play with some of the time related fields in the plugin config and nothing seemed to affect the delay.
I'd be curious to find out why this is happening and how to fix it (if it can be fixed) otherwise this plugin may be useless to us.
We never got an answer to the "why" question. We also uncovered another limitation: this version of the Add-On uses a method for ingesting CloudWatch Logs that is constrained by a hard rate limit by AWS. AWS limits each account to 10 CWL log detail requests per second through the CWL API, each of which can return no more than 1MB of data. That is a 10MB/s aggregate limit on CWL events through the add-on. Both of these limits were significant to us, and we ultimately abandoned the Add-On.