All Apps and Add-ons

Why do the "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype?

smitra_splunk
Splunk Employee
Splunk Employee

Hi,

This issue relates to why "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype.

I have found the co-relation between the SQS queue message backlog and a warning message in the splunk_ta_aws_aws_sqs_based_s3 logs.

Splunk AWS AddOn is NOT deleting messages from the SQS queue whenever there is a log entry like the one below, which indicates that there is no S3 bucket in that message.

2018-02-07 16:07:35,559 level=WARNING pid=24549 tid=Thread-2 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_process:248 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019655.52 message_id="67c27878-01d5-4896-bba7-4df1d9f6946c" job_id=0702de8f-9e66-4f1c-9608-644d6d3cc12b ttl=300 | message="There's no files need to be processed in this message."   

However, the AddOn IS deleting messages from the SQS queue whenever there is a log like the one below which indicates that there is an S3 bucket in that message.

2018-02-07 16:07:34,815 level=INFO pid=24549 tid=Thread-1 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_index_summary:351 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019654.27 message_id="b9d0d088-3e7c-40c2-99ac-a66fba6adf20" job_id=5bb3a90a-ff87-41fe-9633-91c67621e8e4 ttl=300 | message="Sent data for indexing." last_modified="2018-02-07T16:07:23Z" key="s3://mycompany-security-config-bucket-us-east-1/mycompany.IO-config-logs/AWSLogs/567090277256/Config/us-east-1/2018/2/7/ConfigSnapshot/567090277256_Config_us-east-1_ConfigSnapshot_20180207T160722Z_11fb9f0e-00b0-42f5-a5d3-3462d8814f9b.json.gz" size=48526

Why can't the AWS TA delete queue messages when there is no S3 bucket mentioned in the message?

Could this be a bug in Splunk AWS AddOn or can be alleviated by a queue tuning setting in AWS SQS and/or the Splunk AWS TA?

Thanks in advance for any hint !

best regards,
Shreedeep

sylim_splunk
Splunk Employee
Splunk Employee

This is what I found and could improve situation by tweaking the code. By default it is configured to check its credentials every 30 secs. Once it is detected then modinput goes down in a not-graceful way leaving many entries in "in-flight"..
I changed it to 14400 which means check it every 4hours. - in line 76 below; This reduced the in-flight by 95+ %.

etc/apps/Splunk_TA_aws/bin/splunksdc/config.py

74 def has_expired(self):
75 now = time.time()
76 if now - self._last_check > 30:
77 eelf._last_check = now
78 self._has_expired = self._check()

0 Karma

scuba_steve
Engager

Did you ever determine the cause of this issue?

0 Karma

smitra_splunk
Splunk Employee
Splunk Employee

nope.
But I'm observing that this problem can be avoided by reading the datasource directly from SQS if possible instead of SQS-based S3.

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...