All Apps and Add-ons

Why do the "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype?

smitra_splunk
Splunk Employee
Splunk Employee

Hi,

This issue relates to why "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype.

I have found the co-relation between the SQS queue message backlog and a warning message in the splunk_ta_aws_aws_sqs_based_s3 logs.

Splunk AWS AddOn is NOT deleting messages from the SQS queue whenever there is a log entry like the one below, which indicates that there is no S3 bucket in that message.

2018-02-07 16:07:35,559 level=WARNING pid=24549 tid=Thread-2 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_process:248 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019655.52 message_id="67c27878-01d5-4896-bba7-4df1d9f6946c" job_id=0702de8f-9e66-4f1c-9608-644d6d3cc12b ttl=300 | message="There's no files need to be processed in this message."   

However, the AddOn IS deleting messages from the SQS queue whenever there is a log like the one below which indicates that there is an S3 bucket in that message.

2018-02-07 16:07:34,815 level=INFO pid=24549 tid=Thread-1 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_index_summary:351 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019654.27 message_id="b9d0d088-3e7c-40c2-99ac-a66fba6adf20" job_id=5bb3a90a-ff87-41fe-9633-91c67621e8e4 ttl=300 | message="Sent data for indexing." last_modified="2018-02-07T16:07:23Z" key="s3://mycompany-security-config-bucket-us-east-1/mycompany.IO-config-logs/AWSLogs/567090277256/Config/us-east-1/2018/2/7/ConfigSnapshot/567090277256_Config_us-east-1_ConfigSnapshot_20180207T160722Z_11fb9f0e-00b0-42f5-a5d3-3462d8814f9b.json.gz" size=48526

Why can't the AWS TA delete queue messages when there is no S3 bucket mentioned in the message?

Could this be a bug in Splunk AWS AddOn or can be alleviated by a queue tuning setting in AWS SQS and/or the Splunk AWS TA?

Thanks in advance for any hint !

best regards,
Shreedeep

sylim_splunk
Splunk Employee
Splunk Employee

This is what I found and could improve situation by tweaking the code. By default it is configured to check its credentials every 30 secs. Once it is detected then modinput goes down in a not-graceful way leaving many entries in "in-flight"..
I changed it to 14400 which means check it every 4hours. - in line 76 below; This reduced the in-flight by 95+ %.

etc/apps/Splunk_TA_aws/bin/splunksdc/config.py

74 def has_expired(self):
75 now = time.time()
76 if now - self._last_check > 30:
77 eelf._last_check = now
78 self._has_expired = self._check()

0 Karma

scuba_steve
Engager

Did you ever determine the cause of this issue?

0 Karma

smitra_splunk
Splunk Employee
Splunk Employee

nope.
But I'm observing that this problem can be avoided by reading the datasource directly from SQS if possible instead of SQS-based S3.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...