Getting Data In

s3 - Multiple directories data to be ingested into Splunk Cloud

shoaibalimir
Path Finder

Hi Community,

I'm exploring ways to ingest data into Splunk Cloud from a Amazon s3 Bucket which has multiple directories and multiple files to be ingested onto Splunk.

Now, I have assessed the Generic s3, SQS-s3 and the Data Manager Inputs for AWS available on Splunk but am not getting the required outcome.

My use case is given below:

There's a s3 bucket named as exampledatastore, in that there's a directory named as statichexcodedefinition, in that there're multiple message Ids and Dates.

The s3 example structure is given below:

s3://exampledatastore/statichexcodedefinition/{messageId}/functionname/{date}/* - functionnameattribute

Where the {messageId} and the {date} values are dynamic. And I have a start date to begin with but the messageId varies.

Please can you assist me on this on how to get the data into Splunk.

Many Thanks!

Labels (3)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @shoaibalimir 

When you assessed and didnt get the required outcome - what is the issue you had specifically?

Is this a one-time ingestion of historic files already in S3, or are you wanting to ingest on an ongoing basis (I assume the latter?).

Personally I would avoid Generic-S3 as it relies on checkpoint files and can get messy quickly. SQS based S3 is the way to go here I believe. 

Check out https://splunk.github.io/splunk-add-on-for-amazon-web-services/SQS-basedS3/ for more details on setting up SQS-based-S3 input. Its also worth nothing that the dynamic parts of the path shouldnt be a problem. If you have requirements to put them into specific indexes depending on the dynamic values then you can configure this when you setup the event notification (https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications.html) and will probably need multiple SQS. Alternatively you could use props/transforms to route to the correct index at ingest time.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

shoaibalimir
Path Finder

Hi @livehybrid 

I'll assess again with the SQS-s3 Connector, and I'll need to ingest both historic data as well as ongoing data stream.

By the initial observations I think I'll need to use multiple SQS-s3 Connectors or would need to use Lambda to process those into single SQS-s3 Connector.

Please let me know if there's any other alternative to this assumption.

Thanks!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...