Hello,
I'm looking to change our indexing architecture
We have dozens of AWS accounts. We use the Splunk AWS app to ingest the data from a SQS queue. Currently, we have a single SQS-based input type for each individual AWS account that grabs all the data and applies the index and a catch-all sourcetype named aws:logbucket.
From there, we route the data to a more specific sourcetype based on the type of data. aws:logbucket will be changed to aws:cloudwatch:vpcflowlogs, aws:cloudtrail, aws:config, etc.
This has worked well enough for us, but I now have a new requirement.
For each of these AWS accounts, I want a separate index for the specific AWS service by AWS account. ie) awsaccount1-vpcflow, awsaccount1-cloudtrail, awsaccount2-vpcflow, etc. We use S2, so storing aws:cloudtrail with aws:cloudwatch:vpcflow hurts the performance of aws:cloudtrail data. Searching for aws:cloudtrail data requires us to write back all aws:cloudwatch:vpcflow data back to disk. This has accounted for 120x more buckets required written to disk for aws:cloudtrail since it's stored with VPCFlow. Expanding these indexes to be more specific will have huge performance improvements for my Splunk environment
I would like to use a lookup table to match the source of the SQS-based S3 to specify the index and sourcetype. I am unable to do this using regex and FORMAT, since the bucket names and index names are not a 1-1 match. ie) for s3://acc1/cloudtrail/..., I would like to have a lookup table that tells it to route to index account1 and sourcetype aws:cloudtrail, for s3://acc2/config/... I would like to have it route to index account2 and sourcetype aws:config.
After that long summary... how do I technically implement this and how will a lookup with ~300-400 different rows affect performance?
Thank you,
Nate