All Apps and Add-ons

Why do the "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype?

smitra_splunk
Splunk Employee
Splunk Employee

Hi,

This issue relates to why "in-flight" messages keeps piling up for SQS-based S3 input's queue for aws:config sourcetype.

I have found the co-relation between the SQS queue message backlog and a warning message in the splunk_ta_aws_aws_sqs_based_s3 logs.

Splunk AWS AddOn is NOT deleting messages from the SQS queue whenever there is a log entry like the one below, which indicates that there is no S3 bucket in that message.

2018-02-07 16:07:35,559 level=WARNING pid=24549 tid=Thread-2 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_process:248 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019655.52 message_id="67c27878-01d5-4896-bba7-4df1d9f6946c" job_id=0702de8f-9e66-4f1c-9608-644d6d3cc12b ttl=300 | message="There's no files need to be processed in this message."   

However, the AddOn IS deleting messages from the SQS queue whenever there is a log like the one below which indicates that there is an S3 bucket in that message.

2018-02-07 16:07:34,815 level=INFO pid=24549 tid=Thread-1 logger=splunk_ta_aws.modinputs.sqs_based_s3.handler pos=handler.py:_index_summary:351 | start_time=1518019605 datainput="AWS-Config-debug", created=1518019654.27 message_id="b9d0d088-3e7c-40c2-99ac-a66fba6adf20" job_id=5bb3a90a-ff87-41fe-9633-91c67621e8e4 ttl=300 | message="Sent data for indexing." last_modified="2018-02-07T16:07:23Z" key="s3://mycompany-security-config-bucket-us-east-1/mycompany.IO-config-logs/AWSLogs/567090277256/Config/us-east-1/2018/2/7/ConfigSnapshot/567090277256_Config_us-east-1_ConfigSnapshot_20180207T160722Z_11fb9f0e-00b0-42f5-a5d3-3462d8814f9b.json.gz" size=48526

Why can't the AWS TA delete queue messages when there is no S3 bucket mentioned in the message?

Could this be a bug in Splunk AWS AddOn or can be alleviated by a queue tuning setting in AWS SQS and/or the Splunk AWS TA?

Thanks in advance for any hint !

best regards,
Shreedeep

sylim_splunk
Splunk Employee
Splunk Employee

This is what I found and could improve situation by tweaking the code. By default it is configured to check its credentials every 30 secs. Once it is detected then modinput goes down in a not-graceful way leaving many entries in "in-flight"..
I changed it to 14400 which means check it every 4hours. - in line 76 below; This reduced the in-flight by 95+ %.

etc/apps/Splunk_TA_aws/bin/splunksdc/config.py

74 def has_expired(self):
75 now = time.time()
76 if now - self._last_check > 30:
77 eelf._last_check = now
78 self._has_expired = self._check()

0 Karma

scuba_steve
Engager

Did you ever determine the cause of this issue?

0 Karma

smitra_splunk
Splunk Employee
Splunk Employee

nope.
But I'm observing that this problem can be avoided by reading the datasource directly from SQS if possible instead of SQS-based S3.

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...