What methods can I use to iterate Events Service settings?

Georgiy_Chigric · ‎02-28-2020

For Internal AppDynamics Audiences

What methods can I use to iterate Events Service settings?

Is the Events Service dropping events?

If the Events Service is dropping events, determine why:

Is Events Service CPU-bound, memory-bound, I/O-bound, or some combination of these?
Under what loads is the Events Service losing events?

Events Service is losing events... Then...

Essentially all the time T-shirt size chosen is clearly too small

During peak load times Determine whether or not losing some events is tolerable

Sometimes the analytical value of the aggregate of events matters more than any single event. In these cases, dropping some events may be fine.

How can I use the KPIs to troubleshoot?

What are the KPIs telling you? Review them to see:

Is CPU running hot a lot?
Is memory usage too high?
Is garbage collection happening too frequently?
Is I/O inadequate?

What is the best scaling response to a deficiency?

Is the best response to a deficiency to scale vertically, or to scale horizontally? If you are CPU-bound, can you just scale up the CPU side of it? If you don't have the ability to do that, you can scale horizontally—just add more nodes.

This sort of reasoning applies to deficiencies in any KPI or criterion.

The answer is not always to scale up—you may discover that you are over-provisioned. In that case, you can scale down.

What changes are happening over time?

Sizing must be an iterative process. The sizing that you come up with initially might be right over the longer term, or it might not. Try to get a sense of how similarly or differently traffic is behaving as time goes on.

How does the infrastructure of a given deployment affect performance?

Bear in mind that the sizing estimates you obtain from this series of articles are based on testing one particular set of infrastructure—namely, EC2 instances—which may differ in many ways from the infrastructure found in on-prem deployments.

In the field, you may encounter virtual machines or bare metal—each of which may behave differently even if the specs are superficially similar. For example, a given deployment might be on AWS while another might be on GCP—and different clouds behave differently.

How do I troubleshoot the Events Service?

What methods can I use to iterate Events Service settings?

Table of Contents

Is the Events Service dropping events?

How can I use the KPIs to troubleshoot?

What is the best scaling response to a deficiency?

What changes are happening over time?

How does the infrastructure of a given deployment affect performance?

Business iQ and Analytics

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Events Service is losing events...	Then...
Essentially all the time	T-shirt size chosen is clearly too small
During peak load times	Determine whether or not losing some events is tolerable

Are you a member of the Splunk Community?

How do I troubleshoot the Events Service?

What methods can I use to iterate Events Service settings?

Table of Contents

Is the Events Service dropping events?

How can I use the KPIs to troubleshoot?

What is the best scaling response to a deficiency?

What changes are happening over time?

How does the infrastructure of a given deployment affect performance?

Business iQ and Analytics

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers