All Apps and Add-ons

Why do the "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype?

smitra_splunk
Splunk Employee
Splunk Employee

Hi,

This issue relates to why "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype.

I have found the co-relation between the SQS queue message backlog and a warning message in the splunk_ta_aws_aws_sqs_based_s3 logs.

Splunk AWS AddOn is NOT deleting messages from the SQS queue whenever there is a log entry like the one below, which indicates that there is no S3 bucket in that message.

2018-02-07 16:07:35,559 level=WARNING pid=24549 tid=Thread-2 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_process:248 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019655.52 message_id="67c27878-01d5-4896-bba7-4df1d9f6946c" job_id=0702de8f-9e66-4f1c-9608-644d6d3cc12b ttl=300 | message="There's no files need to be processed in this message."   

However, the AddOn IS deleting messages from the SQS queue whenever there is a log like the one below which indicates that there is an S3 bucket in that message.

2018-02-07 16:07:34,815 level=INFO pid=24549 tid=Thread-1 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_index_summary:351 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019654.27 message_id="b9d0d088-3e7c-40c2-99ac-a66fba6adf20" job_id=5bb3a90a-ff87-41fe-9633-91c67621e8e4 ttl=300 | message="Sent data for indexing." last_modified="2018-02-07T16:07:23Z" key="s3://mycompany-security-config-bucket-us-east-1/mycompany.IO-config-logs/AWSLogs/567090277256/Config/us-east-1/2018/2/7/ConfigSnapshot/567090277256_Config_us-east-1_ConfigSnapshot_20180207T160722Z_11fb9f0e-00b0-42f5-a5d3-3462d8814f9b.json.gz" size=48526

Why can't the AWS TA delete queue messages when there is no S3 bucket mentioned in the message?

Could this be a bug in Splunk AWS AddOn or can be alleviated by a queue tuning setting in AWS SQS and/or the Splunk AWS TA?

Thanks in advance for any hint !

best regards,
Shreedeep

sylim_splunk
Splunk Employee
Splunk Employee

This is what I found and could improve situation by tweaking the code. By default it is configured to check its credentials every 30 secs. Once it is detected then modinput goes down in a not-graceful way leaving many entries in "in-flight"..
I changed it to 14400 which means check it every 4hours. - in line 76 below; This reduced the in-flight by 95+ %.

etc/apps/Splunk_TA_aws/bin/splunksdc/config.py

74 def has_expired(self):
75 now = time.time()
76 if now - self._last_check > 30:
77 eelf._last_check = now
78 self._has_expired = self._check()

0 Karma

scuba_steve
Engager

Did you ever determine the cause of this issue?

0 Karma

smitra_splunk
Splunk Employee
Splunk Employee

nope.
But I'm observing that this problem can be avoided by reading the datasource directly from SQS if possible instead of SQS-based S3.

0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...