My use case:
I have several very small lambda functions that run hourly and output ~20 lines each time. I've configured a cloudwatch input for each functions log group with a 600s frequency. These are the only things using cloudwatch api.
Problem:
While the setup technically 'works', more often than not, only the first log line of each invocation is indexed. The remaining lines are conspicuously missing. Occasionally, it does get all the log lines.
Root cause:
I looked over the internal logs and I only see normal operations. I think the issue is how the add-on tests for new logs. Each invocation of lambda creates a new log stream. See example my example log stream (some data redacted for privacy)
{
"firstEventTimestamp": 1531317415254,
"lastEventTimestamp": 1531317415254,
"creationTime": 1531317406089,
"uploadSequenceToken": "Atoken",
"logStreamName": "MyLogStream",
"lastIngestionTime": 1531317582675,
"arn": "someARN",
"storedBytes": 0
},
This example is of a log stream that has completed its lambda execution, meaning there are some 20+ lines in there. Yet, we can clearly see that the "EventTimestamp" attributes are identical. Thats odd, lets look at the events.
{
"nextForwardToken": "sometoken",
"events": [
{
"ingestionTime": 1531317415315,
"timestamp": 1531317415254,
"message": "START RequestId somestuff \n"
},
{
"ingestionTime": 1531317430720,
"timestamp": 1531317415255,
"message": "Mylogline \n"
},
.........
<redacted event list>
.........
{
"ingestionTime": 1531317597749,
"timestamp": 1531317582672,
"message": "Final log line \n"
},
{
"ingestionTime": 1531317597749,
"timestamp": 1531317582673,
"message": "END RequestId: somestuff \n"
},
{
"ingestionTime": 1531317597749,
"timestamp": 1531317582673,
"message": "REPORT RequestId: some more stuff"
}
],
"nextBackwardToken": "sometoken"
}
Here can clearly see the each event's timestamp is different, and the last events timestamp does NOT match the "lastEventTimestamp" attribute of the log stream. The "lastEventTimestamp" matches the first event. I believe that the add-on is using the "lastEventTimestamp" to determine if there are more log events to index. This will cause missing logs under certain conditions. In my case, when the lambda executions are still running when it indexes the log stream. Lucky me, since my hourly lambda executions happen on the same interval as the check (*/10), I'm missing nearly all my logs.
Best i can tell, it should use the "lastIngestionTime" attribute which actually seems to indicate some sort of change.
EDIT: I modified the execution interval several times to no avail. Maybe 1/10 streams are ever indexed properly, making this input pretty useless.
... View more