Top 5 Container Metrics

28

March, 2019 by Christian Meléndez

Even though containers are different from virtual machines (VMs), most of the metrics you get from a container are pretty similar to the ones you get from a VM or a physical server. What’s different is the meaning a metric has in a containerized system. Metrics are important because they give you a better understanding of what’s happening in the system and how that affects the continuous delivery pipeline. Today’s post is about the top container metrics you’ll want to look at before going deeper when troubleshooting. Most of the time, problems are solved just by scaling out the containers. With the help of container orchestrators like Kubernetes, most applications will self-heal, removing the need for manual intervention before scaling out. Ready? Let’s take a look at the top metrics when working with containers.

1. Memory Consumption

Memory is the most critical metric when it comes to Docker containers because when the memory is full, the container stops working. Java applications usually face the problem of being terminated when memory isn’t configured correctly. When a Docker container’s memory hits the limit, Docker kills the process inside the container that is causing that. Docker protects the host by stopping the container so there’s no chance it will affect other containers running in the same host. What’s good about containers is that they can be started again in a matter of seconds if Docker stopped them. Even so, the application will have some downtime—even if it’s only for a short period. If you’re using Docker Compose, you can configure the container to start again when it’s stopped. Or if you’re using Kubernetes, you can configure the desired state of how many containers you want to be running all the time. Kubernetes will then continuously validate that the state is compliant by recreating containers. Memory is essential not only to avoid unexpected responses for an application, but also to set the guidelines for when to scale out the containers. The memory metric will set a threshold to define autoscaling rules. An auto-scaling rule might say, “If memory consumption is higher than 80 percent, then start two more containers.” It’s always best to scale out containers based on memory consumption to avoid unresponsive applications. It’s good to know what the memory consumption is, but you need to take preventive actions. Container orchestrators like Kubernetes enable you to do that.

2. CPU Usage

Another critical metric in a container is CPU usage. In Docker, by default containers have full access to the CPU resources in the host. To avoid affecting other containers in the host, you can limit the percentage of CPU that Docker will allow a container to use. For example, let’s say that a container has a CPU configuration of 0.5. That means the container will have access to 50 percent at most of one CPU core from the host. Docker will not stop a container when the container reaches its CPU limit, but the application’s performance will be negatively impacted when this happens. With the CPU metric, you can quickly identify if you need to give more CPU resources to the container or not. Containers will help you to optimize resources but only if you know where you need to tune—for example if you’re configuring way too many CPU resources and the container only uses a small portion of those resources. As with memory, the CPU metric works as a threshold to configure autoscaling rules for the containers. In Kubernetes or any other container orchestrator, you can define autoscaling rules when a container reaches the CPU limit to add more containers. Therefore, applications performance will be steady because the orchestrator will make sure to provide as many containers as are needed.

3. Disk Operations

Disk operations metrics like I/O operations or disk usage are numbers that could affect more than just a single container’s performance. The disk is a resource that both containers and the host depend on to provide good performance. You can get I/O metrics with the “docker stats” command. And you can get disk usage metrics with the “docker system df” and get more detailed information by adding the “-v” flag. Controlling the disk usage in the host is crucial to avoid problems when pulling new container images. Images could be huge, which is why you might need to run a clean-up process to remove old images. Logs written to the disk is another reason why disk usage increases. So “log rotation” is a good practice you still need to implement with containers. Even though containers are stateless by default, other applications are stateful like a database. For stateful applications, you’ll use volumes. A volume is how you map a folder or driver inside the container with another path or driver in the host. You could create volumes in the host before assigning them to a container. Disks become a shared resource that could affect containers’ and the host’s performance.

4. Network Traffic

As you’ve seen with the previous metrics, the same type of metrics that are important in a virtual or physical server are still crucial for containers. So networking traffic can’t be missed when discussing the top metrics for containers. You could get networking traffic with the “docker stats” command to know the amount of data a container has sent or received. Knowing how much traffic is happening in a container could be a clue whether the host for a container is appropriate or not. Maybe the container needs to be running in a host that has better network performance. Or perhaps you need to scale out the containers. I’ve seen cases where specific applications, memory, and CPU were OK but the application’s performance was still poor. When taking a look at the networking metrics, I’ve found some interesting information. Sometimes, the network traffic was low, and it was because of a misconfiguration in the load balancer. Other times, the applications in the container weren’t able to use enough network bandwidth. My point here is that there were times where I wasn’t able to spot problems (for example because of a host type or a bug in the application) with memory or CPU metrics, but I was able to with network metrics.

5. Number of Containers Running

To run containers at scale, you need to use a container orchestrator like Kubernetes or Swarm. There are going to be times when the orchestrator won’t be able to schedule more containers because there are no more resources available. To fix this, you could use the number of containers that are running to scale out the host. I noticed the importance of this metric when a friend used it to demonstrate that certain containers were having problems. It turns out that because of a misconfiguration, the container was running only for a period of time. Of course, the orchestrator was spinning up a new container but the application was unstable. You can get this metric by running a “docker container is” command or by asking the orchestrator how many containers are running for an application. In Kubernetes you can get this information by listing the pods that are running or by getting the state of a deployment object. The number of containers running is not a metric of a container per se. Sometimes you need to zoom out to get a different perspective on a problem.

Don’t Rely Only Upon Top Metrics

I didn’t give you too many details on how to collect these types of metrics for containers. The reason for that is that you now have a lot of tools for that job, which are available as paid services or for free. Some examples of free tools are cAdvisor, InfluxDB, Grafana, or Prometheus. These tools will give you other metrics—not just for the containers, but also for the host. So don’t rely only on the top metrics I listed here. There are going to be times when these metrics look good and maybe the host is running out of IOPS, or the memory swap is terrible, which is when alternate metrics will come in handy. Knowing which are the top metrics for containers is just the beginning of improving the lead time. For example, you can also use these metrics when drawing the value stream mapping.

Contact Us

Christian Meléndez This post was written by Christian Meléndez. Christian is a technologist that started as a software developer and has more recently become a cloud architect focused on implementing continuous delivery pipelines with applications in several flavors, including .NET, Node.js, and Java, often using Docker containers.

Relevant Articles

Release vs Deployment Management: What’s the Difference?

0 Comments

In the always-an-adventure world of IT service management, there are several key processes that are essential for delivering high-quality services to customers and end-users. Two of the most critical processes are release management and deployment management. These...

7 Tools to Help with Application Rationalization

0 Comments

Application rationalization is the process of identifying which applications an organization should keep, update, consolidate, or retire. Think of it as a financial adviser, but instead of your investment portfolio, it's your application portfolio. Most companies take...

Pairing DevOps with Test Environment Management

0 Comments

For many organizations, DevOps is the best practice for efficiency. However, this model doesn’t come easily as the organization needs to put certain things in place. For example, the firm needs to incorporate the right tools to ensure its delivery pipeline and...

8 DevOps Anti-Patterns You Should Avoid

0 Comments

It’s the normal case with software buzzwords that people focus so much on what something is that they forget what it is not. DevOps is no exception. To truly embrace DevOps and cherish what it is, it’s important to comprehend what it isn’t. A plethora...

An Introductory Guide to Guidewire Data Masking

0 Comments

Testing is an essential part of maintaining a healthy Guidewire environment. But because Guidewire applications handle large volumes of personally identifiable information (PII), simply copying production data for testing or training isn’t an option. This is where...

Types of Test Data: 4 to Use for Your Software Tests

0 Comments

Testing is an integral and vital part of creating software. In fact, test code is as important as your production code. When you create test code, you need to create test data for your code to work against. This post is about the different types of test...