It’s not news that there is a lot of buzz around containers. As companies begin to widely deploy microservices architectures, containers are the obvious choice with which to implement them. As companies deploy container clusters into production, however, an issue has to be dealt with immediately: container architectures have a lot of moving parts. The whole point of microservices is to break apart monolithic components into smaller services. This means that what was once a big process running on a resource-rich server is now multiple processes spread across one or many servers. On top of the architecture change, a container cluster usually encompasses a variety of containers that are not application code. These include security, load balancing, network management, web servers, etc. Entire frameworks, such as NGINX Unit 1.0, may be deployed as infrastructure for the cluster. Services that used to be centralized in a network are now incorporated into the application itself as part of the container network.
Because an “application” is now really a collection of smaller services running in a virtual network, there’s a lot more that can go wrong. The more containers, the more opportunities for misbehaving components. For example:
- Network issues. No matter how the network is actually implemented, there are opportunities for typical network problems to emerge including deadlocked communication and slow connections. Instead of these being part of monolithic network appliances, they are distributed throughout a number of local container clusters.
- Apps that are slow and make everything else slower. Poor performance of a critical component in the cluster can drag down overall performance. With microservices, the entire app can be waiting on a service that is not responding quickly.
- Containers that are dying and respawning. A container can crash which may cause an orchestrator such as Kubernetes to respawn the container. A badly behaving container may do this multiple times.
These are just a few examples of the types of problems that a container cluster can have that negatively affect a production system. None of these are new to applications in general. Applications and services can fail, lock up, or slow down in other architectures. There are just a lot more parts in a container cluster creating more opportunities for problems to occur. In addition, typical application monitoring tools aren’t necessarily designed for container clusters. There are events that traditional application monitoring will miss especially issues with containers and Kubernetes themselves.
To combat these issues, a generation of products and open source projects are emerging that are retrofit or purpose-built for container clusters. In some cases, app monitoring has been extended to include containers (New Relic comes to mind). New companies, such as LightStep, have also entered the market for application monitoring but with containers in mind from the onset. Just as exciting are the open source projects that are gaining steam. Prometheus (for application monitoring), OpenTracing (network tracing), and Jaeger (transaction tracing) are some of the open source projects that are help gather data about the functioning of a cluster.
What makes these projects and products interesting is that they place monitoring components in the clusters, close to the applications’ components, and take advantage of container and Kubernetes APIs. This placement helps sysops to have a more complete view of all the parts and interactions of the container cluster. Information that is unique to containers and Kubernetes are available alongside traditional application and network monitoring data.
As IT departments start to roll scalable container clusters into production, knowing what is happening within is essential. Thankfully, the ecosystem for monitoring is evolving quickly, driven equally but companies and open source communities.