Datamodel To Accelerate Billing Data: Help With Be...

paimonsoror · ‎11-20-2019

I have been working on a Data model to use for AWS Billing information. It actually is an enhancement to the one used by the AWS app since this one will leverage the new AWS CUR bills.

For those who arent aware, AWS writes CUR billing every day, and it rewrites the month to date costs to s3 each day of the month until the next month. That means, to get the latest and greatest bill, you need to look at the assembly id to get the most up to date invoices.

All of my billing is going into amazon_billing and i have a lookup that runs to get the latest `AssemblyId by month.

My datamodel has been created, and the purpose here is so that I can accelerate the data and get it visualizewd for folks looking to get their billing information. There is a single dataset in the model, search driven, and looks like:

index=amazon_billing sourcetype=aws:billing:cur [ | inputlookup aws_billing_assemblyId.csv | return AssemblyId]

I have a challenge getting this data model fully accelerated. The settings are set to a -6mon summary range with a backfill to match. The search time is set for 1h with 6 concurrent searches running. The period is set to */5 * * * *.

As expected, the acceleration building is slow because of the sheer amount of data over the last 6 months, and over the past few days, I have reached 76% acceleration with 1.91gb size on disk (52 buckets). The problem is that now it looks like it is stuck.

Looking at the scheduler searches, I can see the acceleration running, but it is only returning 52 results per run. Running my searches with 'summaries only' produces invalid results.

Could use some assistance on finding the ""right"" way to do this. Thanks!!

Edit: 1025EST
One thing I should add to help this out:
Goals: Want to be able to quickly traverse billing data, and present it in a dashboard so folks can see their cost per account, etc. I chose a datamodel here because, I need the "base" data accelerated, and then be able to slice and dice it depending on what i am trying to visualze (cost by account ,cost by service, etc)

Datamodel To Accelerate Billing Data: Help With Best Practices

data model

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Introducing ITSI 5.0: Unified Visibility and Actionable Insights

Inside Splunk Agent Observability: Understanding Agent Behavior, Tokens & Costs

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Join the Conversation