Getting Data In

s3 - Multiple directories data to be ingested into Splunk Cloud

shoaibalimir
Explorer

Hi Community,

I'm exploring ways to ingest data into Splunk Cloud from a Amazon s3 Bucket which has multiple directories and multiple files to be ingested onto Splunk.

Now, I have assessed the Generic s3, SQS-s3 and the Data Manager Inputs for AWS available on Splunk but am not getting the required outcome.

My use case is given below:

There's a s3 bucket named as exampledatastore, in that there's a directory named as statichexcodedefinition, in that there're multiple message Ids and Dates.

The s3 example structure is given below:

s3://exampledatastore/statichexcodedefinition/{messageId}/functionname/{date}/* - functionnameattribute

Where the {messageId} and the {date} values are dynamic. And I have a start date to begin with but the messageId varies.

Please can you assist me on this on how to get the data into Splunk.

Many Thanks!

Labels (3)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @shoaibalimir 

When you assessed and didnt get the required outcome - what is the issue you had specifically?

Is this a one-time ingestion of historic files already in S3, or are you wanting to ingest on an ongoing basis (I assume the latter?).

Personally I would avoid Generic-S3 as it relies on checkpoint files and can get messy quickly. SQS based S3 is the way to go here I believe. 

Check out https://splunk.github.io/splunk-add-on-for-amazon-web-services/SQS-basedS3/ for more details on setting up SQS-based-S3 input. Its also worth nothing that the dynamic parts of the path shouldnt be a problem. If you have requirements to put them into specific indexes depending on the dynamic values then you can configure this when you setup the event notification (https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications.html) and will probably need multiple SQS. Alternatively you could use props/transforms to route to the correct index at ingest time.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

shoaibalimir
Explorer

Hi @livehybrid 

I'll assess again with the SQS-s3 Connector, and I'll need to ingest both historic data as well as ongoing data stream.

By the initial observations I think I'll need to use multiple SQS-s3 Connectors or would need to use Lambda to process those into single SQS-s3 Connector.

Please let me know if there's any other alternative to this assumption.

Thanks!

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...