Hi Community,
I'm exploring ways to ingest data into Splunk Cloud from a Amazon s3 Bucket which has multiple directories and multiple files to be ingested onto Splunk.
Now, I have assessed the Generic s3, SQS-s3 and the Data Manager Inputs for AWS available on Splunk but am not getting the required outcome.
My use case is given below:
There's a s3 bucket named as exampledatastore, in that there's a directory named as statichexcodedefinition, in that there're multiple message Ids and Dates.
The s3 example structure is given below:
s3://exampledatastore/statichexcodedefinition/{messageId}/functionname/{date}/* - functionnameattribute
Where the {messageId} and the {date} values are dynamic. And I have a start date to begin with but the messageId varies.
Please can you assist me on this on how to get the data into Splunk.
Many Thanks!
When you assessed and didnt get the required outcome - what is the issue you had specifically?
Is this a one-time ingestion of historic files already in S3, or are you wanting to ingest on an ongoing basis (I assume the latter?).
Personally I would avoid Generic-S3 as it relies on checkpoint files and can get messy quickly. SQS based S3 is the way to go here I believe.
Check out https://splunk.github.io/splunk-add-on-for-amazon-web-services/SQS-basedS3/ for more details on setting up SQS-based-S3 input. Its also worth nothing that the dynamic parts of the path shouldnt be a problem. If you have requirements to put them into specific indexes depending on the dynamic values then you can configure this when you setup the event notification (https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications.html) and will probably need multiple SQS. Alternatively you could use props/transforms to route to the correct index at ingest time.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Hi @livehybrid
I'll assess again with the SQS-s3 Connector, and I'll need to ingest both historic data as well as ongoing data stream.
By the initial observations I think I'll need to use multiple SQS-s3 Connectors or would need to use Lambda to process those into single SQS-s3 Connector.
Please let me know if there's any other alternative to this assumption.
Thanks!