Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

Kubernetes Horizontal Pod Autoscaling

CaitlinHalla
Splunk Employee
Splunk Employee

Intro

In a Kubernetes environment, you can scale your application up or down with a simple command, a UI, or automatically with autoscalers. However, to scale successfully, you need to know when you’re hitting scaling limits and if/when your scaling efforts are effective. Otherwise, you might continue to inefficiently use resources or hit application performance issues unnecessarily. In this post, we’ll check out Kubernetes Horizontal Pod Autoscaling (HPA), when you might use HPA, caveats you might hit when scaling pods, and how you can use Splunk Observability Cloud to gain insight into your Kubernetes environment to ensure you’re scaling efficiently and effectively. 

Kubernetes Autoscaling

Autoscaling is an awesome way to increase the capacity of your Kubernetes environment to match application resource demands with minimal manual intervention. With autoscaling, scalable resources automatically increase or decrease with variable demand. This creates a more elastic, more performant, and more efficient (both in terms of application resource consumption and infrastructure costs) Kubernetes environment.

Kubernetes supports both vertical and horizontal scaling. With vertical scaling (up/down), resources like memory and CPU are adjusted in place (think increasing/decreasing memory for an existing workload). Whereas with horizontal scaling (in/out), the number of replicas increases or decreases (think increasing/decreasing the number of workloads). Vertical scaling is great for right-sizing your Kubernetes workflows to ensure they have the resources they need. Horizontal scaling is great for dynamically scaling to meet unexpected bursts or busts in traffic to distribute the load. 

Horizontal and vertical autoscaling can be configured at the cluster and/or pod level using Cluster Autoscaling, Vertical Pod Autoscaling, and/or Horizontal Pod Autoscaling. The Horizontal Pod Autoscaler (HPA) is the only autoscaler included by default with Kubernetes, so we’ll keep our focus on HPA for now. 

Horizontal Pod Autoscaling

To scale a Kubernetes workload resource like Deployments or StatefulSets based on the current demand of resources, you can manually scale workloads, or you can automatically scale workloads through autoscaling. Scaling up or down automatically to match demand reduces the need for manual intervention and ensures efficient resource use within your Kubernetes infrastructure. If load increases, horizontal scaling will respond by deploying more pods. Conversely, if load decreases, the HorizontalPodAutoscaler will instruct the workload resources to scale down, as long as the number of pods is above the configured minimum. 

Horizontal Pod Autoscaling Gotchas

Automatically scaling pods is a hugely beneficial feature of Kubernetes, but there are some caveats when implementing Horizontal Pod Autoscaling. Here are some things to be aware of: 

  • Metric lag: because the HorizontalPodAutoscaler continuously checks the Metrics API for resource usage in order to inform scaling behavior, there can be a lag between monitoring usage and scaling. HPA checks metrics every 15 seconds by default.
  • Vertical scaling conflicts: VPA and HPA shouldn’t be used together when based on the same metrics – this can lead to competing and conflicting scaling decisions. 
  • Resource limits: if requests and limits aren’t properly configured, HPA might not be able to scale out. Fine-tuning thresholds can be tricky and requires monitoring resource limits.
  • Resource competition: new pods spinning up can compete for resources and can also take time to initialize and stabilize.
  • Not all applications can easily scale horizontally (single-threaded applications, those with order-dependent queues, databases, etc.). Before implementing HPA, you need to determine application compatibility. 
  • DaemonSets: HPA doesn’t apply to DaemonSets – if you want to scale your DaemonSet, you probably should scale your node pool instead. 
  • Dependency bottlenecks: external dependencies (such as 3rd party APIs) might not scale at the same rate or at all – you should have a plan to scale those as well.

Let’s HPA

Now that we know what Horizontal Pod Autoscaling is and some things to be aware of when working with HPA, let’s see it in action. 

We have a PHP/Apache Kubernetes deployment under the Apache namespace that is exporting OpenTelemetry data to Splunk Observability Cloud. Our deployment creates a new StatefulSet with a single replica. Let’s jump into the Splunk Observability Cloud Kubernetes Navigator, which we explored in a previous post

In the Navigator, if we filter down to our cluster and the namespace Apache, we can see that we currently only have one pod in our node: 

1 pod apache.png

The pod is receiving some significant load, and for HPA example purposes, we have deliberately limited the resources for each Apache pod. We can see spikes in CPU and memory usage that are leading to insufficient resources:

cpu usage.png

memory usage.png

The lack of required resources is throwing containers into a CrashLoopBackOff. For a minute we’ll have 1 active container: 

1 active container.png

Then suddenly, that container will crash and we’ll have 0 active containers before it attempts to restart again:

0 containers.png

Not only can we see these containers starting and stopping in real-time, but the restarts triggered an AutoDetect detector that would have notified our team of an issue:

Screenshot 2024-07-10 at 5.07.35 PM.png

The Kubernetes Navigator helped us identify our resource issues and the impacts they’re having on our containers, but now we need to resolve these issues. Let’s now set up Horizontal Pod Autoscaling so our workload will automatically respond to this increased load and scale out by deploying more pods.

First, we’ll create our HPA configuration file under our ~/workshop/k3s/hpa.yaml directory: 

hpa.png

The HorizontalPodAutoscaler object specifies the behavior of the autoscaler. You can control resource utilization, set the min/max number of replicas, specify the direction of scaling (up/down), set target resources to scale, etc. We’ll apply the configuration by running kubectl apply -f ~/workshop/k3s/hpa.yaml

We can see that the autoscaler was created and we can validate Horizontal Pod Autoscaling with the kubectl get hpa -n apache command. Here’s what the response looks like: 

create and validate.png

Now that HPA is deployed, our php-apache service will autoscale when either the average CPU usage goes above 50% or the average memory usage for the deployment goes above 75% with a minimum of 1 pod and max of 4 pods. In the Kubernetes Navigator nodes view, we can validate that we now have 4 pods to handle the increased load. We’ve added a filter to highlight the 4 pods in the Apache namespace: 

4 pods.png

Looking at our K8s pods tab, we can see additional pod-level metrics and again verify the number of active pods is now 4. 

active pods 4.png

If we wanted to increase the number of pods to 8, we could simply update our hpa.yaml and specify 8 maxReplicas. Once deployed, we can see we now have 8 active Apache pods: 

8 pods.png

After configuring our HorizontalPodAutoscaler, we can sit back and watch our container count remain steady as our pods autoscale to handle the increased traffic. 

Wrap Up 

If you’re interested in automatically scaling Kubernetes workloads to match increased load with minimal manual intervention, Horizontal Pod Autoscaling might be for you. Before you get started, watch out for some of those common gotchas we mentioned. To identify pods running heavy on resource utilization and where you might benefit from setting up HPA check out the Splunk Observability Cloud Kubernetes Navigator. Don’t have Splunk Observability Cloud? We got you. Start a Splunk Observability Cloud free trial! Ready to jump into the Kubernetes Navigator? Get started integrating Kubernetes and Splunk Observability Cloud!

Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...