Getting Data In

s3 - Multiple directories data to be ingested into Splunk Cloud

shoaibalimir
Explorer

Hi Community,

I'm exploring ways to ingest data into Splunk Cloud from a Amazon s3 Bucket which has multiple directories and multiple files to be ingested onto Splunk.

Now, I have assessed the Generic s3, SQS-s3 and the Data Manager Inputs for AWS available on Splunk but am not getting the required outcome.

My use case is given below:

There's a s3 bucket named as exampledatastore, in that there's a directory named as statichexcodedefinition, in that there're multiple message Ids and Dates.

The s3 example structure is given below:

s3://exampledatastore/statichexcodedefinition/{messageId}/functionname/{date}/* - functionnameattribute

Where the {messageId} and the {date} values are dynamic. And I have a start date to begin with but the messageId varies.

Please can you assist me on this on how to get the data into Splunk.

Many Thanks!

Labels (3)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @shoaibalimir 

When you assessed and didnt get the required outcome - what is the issue you had specifically?

Is this a one-time ingestion of historic files already in S3, or are you wanting to ingest on an ongoing basis (I assume the latter?).

Personally I would avoid Generic-S3 as it relies on checkpoint files and can get messy quickly. SQS based S3 is the way to go here I believe. 

Check out https://splunk.github.io/splunk-add-on-for-amazon-web-services/SQS-basedS3/ for more details on setting up SQS-based-S3 input. Its also worth nothing that the dynamic parts of the path shouldnt be a problem. If you have requirements to put them into specific indexes depending on the dynamic values then you can configure this when you setup the event notification (https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-event-notifications.html) and will probably need multiple SQS. Alternatively you could use props/transforms to route to the correct index at ingest time.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

shoaibalimir
Explorer

Hi @livehybrid 

I'll assess again with the SQS-s3 Connector, and I'll need to ingest both historic data as well as ongoing data stream.

By the initial observations I think I'll need to use multiple SQS-s3 Connectors or would need to use Lambda to process those into single SQS-s3 Connector.

Please let me know if there's any other alternative to this assumption.

Thanks!

0 Karma
Get Updates on the Splunk Community!

Building Reliable Asset and Identity Frameworks in Splunk ES

 Accurate asset and identity resolution is the backbone of security operations. Without it, alerts are ...

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

For Splunk Cloud customers, understanding and optimizing Splunk Virtual Compute (SVC) usage and resource ...

Automatic Discovery Part 3: Practical Use Cases

If you’ve enabled Automatic Discovery in your install of the Splunk Distribution of the OpenTelemetry ...