First, check out this white paper we put together to walk you through ways to collect data from AWS and into Splunk : https://www.splunk.com/en_us/form/getting-data-into-gdi-splunk-from-aws.html . Next, if I were to set this up I would use a heavy forwarder in AWS with the Splunk TA for AWS installed then use an EC2 IAM Role to authenticate and collect the data. The HF can then forward the data to your on-prem Splunk using a standard forwarding port (TCP:9997).
For higher volume data, I would recommend using Kinesis Data Firehose (KDF) or Lambda functions and send the data via HEC to your Splunk deployment. This will require a public facing IP (for KDF) and a properly signed SSL Cert. You might need to either setup a HF tier or send directly to your indexers with a load balancer as the HEC endpoint.
Cost : The cost for collecting the data is going to come down to a few factors. First, is the amount of data being sent over the Internet Gateway and back to your data center. You can calculate the costs using the AWS calculator :https://calculator.s3.amazonaws.com/index.html . Next is the amount of times an API is hit from Splunk to collect the events. The average cost again can be set on the inputs.conf (or the inputs in the UI). The API calls are usually low, but the amount of data and the method of collecting the data will have an impact on the overall cost of collecting data.