Getting Data In

s3 - Multiple directories data to be ingested into Splunk Cloud

shoaibalimir
Explorer

Hi Community,

I'm exploring ways to ingest data into Splunk Cloud from a Amazon s3 Bucket which has multiple directories and multiple files to be ingested onto Splunk.

Now, I have assessed the Generic s3, SQS-s3 and the Data Manager Inputs for AWS available on Splunk but am not getting the required outcome.

My use case is given below:

There's a s3 bucket named as exampledatastore, in that there's a directory named as statichexcodedefinition, in that there're multiple message Ids and Dates.

The s3 example structure is given below:

s3://exampledatastore/statichexcodedefinition/{messageId}/functionname/{date}/* - functionnameattribute

Where the {messageId} and the {date} values are dynamic. And I have a start date to begin with but the messageId varies.

Please can you assist me on this on how to get the data into Splunk.

Many Thanks!

Labels (3)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @shoaibalimir 

When you assessed and didnt get the required outcome - what is the issue you had specifically?

Is this a one-time ingestion of historic files already in S3, or are you wanting to ingest on an ongoing basis (I assume the latter?).

Personally I would avoid Generic-S3 as it relies on checkpoint files and can get messy quickly. SQS based S3 is the way to go here I believe. 

Check out https://splunk.github.io/splunk-add-on-for-amazon-web-services/SQS-basedS3/ for more details on setting up SQS-based-S3 input. Its also worth nothing that the dynamic parts of the path shouldnt be a problem. If you have requirements to put them into specific indexes depending on the dynamic values then you can configure this when you setup the event notification (https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications.html) and will probably need multiple SQS. Alternatively you could use props/transforms to route to the correct index at ingest time.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

shoaibalimir
Explorer

Hi @livehybrid 

I'll assess again with the SQS-s3 Connector, and I'll need to ingest both historic data as well as ongoing data stream.

By the initial observations I think I'll need to use multiple SQS-s3 Connectors or would need to use Lambda to process those into single SQS-s3 Connector.

Please let me know if there's any other alternative to this assumption.

Thanks!

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...