Why Cloud “Server” Tagging Strategies Are Important
by Christian Meléndez
A key part of Enterprise IT Intelligence is understanding you Cloud Resources. And where better to start than using the “elegant” concept of Tagging. A concept used across the various cloud and infrastructure management platforms. Follows is an article from Christian Meléndez.
A tag is a key/value label to identify a resource in a cloud environment. You might want to know why a server exists and what purpose it serves. AWS has made sure that a simple concept like a tag can become a necessary tool for all of your resources. Tags are free, and they can do more than just give a name to a server. Having a solid tagging strategy will allow you to support the management of your IT delivery life-cycle and your IT solutions.
My first encounter with the cloud happened when working in a previous position. An initial project was to reduce the cloud spending. At the time, this company didn’t even have a naming convention for servers, so trying to identify which server was still in use was challenging. Who created the server? Why was it created? Is it still needed? Consumption metrics weren’t a good indicator of a server’s usefulness because it may have been created in advance for an upcoming project. Or maybe its only consumption was the load balancer doing health checks.
But is identifying resources the only purpose of tags? No. So why else would you use a tag, you might wonder? Well, that’s a good question, and it also happens to be the topic of this post! Let’s find out in more depth.
To Identify Cost
When I was trying to reduce costs in the company I mentioned earlier, the first thing we needed to do was to tag the resources appropriately. We started with the most expensive resources by setting a “Cost Center” tag. We were working with a partner to get cost reports with daily updates. AWS didn’t have cost reports at that time, so we needed to find a way to confirm the bill was going down.
After we determined that having a cost center tag was useful, we started using different types of tags. We used tags like the username, environment, project, and system; these allowed us to know which departments were spending more and to decide if the department’s costs were proportional to its revenue. Every time we detected a change in costs, it was easy for us to explain why it happened: a new project was coming, someone ran load tests, or an environment was terminated.
A good tagging strategy will help you to identify who’s wasting resources and why. Even though we were able to tag all existing resources, there were times that a resource lacked a proper tag. Make sure you solve this problem from the root to have a proactive strategy, not a reactive one.
To Automate Operability
Another reason tagging is important is that it helps you automate your operability. Automation in a cloud environment is never up for debate. Without automation, you’re losing essential benefits of the cloud.
The moment I knew tags were notable for more than just allocating cost was when I started using Ansible. Ansible is a configuration management tool that we used to configure servers after every deployment. The way Ansible knows which servers to configure is by having an inventory with the list of IP addresses.
But wait—having a list of IP addresses in an environment where IPs come and go isn’t sustainable! Here’s where tags come into play again. Working with dynamic environments is one of the best value propositions from Ansible. You only need to create the inventory based on the tags you want to use to identify a server. Then Ansible will build the environment dynamically by querying the AWS API to get the list of IPs.
Another great use case for tags is when you need to run clean-up tasks like deleting old volume snapshots. You could also make use of tags to have instances only during business hours. Or you can create an expiration date tag for the resource.
Tags change the way you manage the infrastructure by introducing automation for in your IT delivery life-cycle.
To Set Proper Monitoring and Alerting Policies
Tagging is also going to help you pay attention to the things you should.
When it comes to monitoring and alerting, one of the things you want is to reduce noise. When there’s too much noise, you start ignoring the alerts. An alert’s purpose is to trigger an action from someone or something. If it doesn’t do that, why bother creating the alarm? You could collect all the metrics you think you’ll need from your resources, but you need a way to identify events. One way to organize and reduce noise is by using tags.
A fundamental principle in continuous delivery is to work with production-like environments, and this includes feedback. You might be surprised by the number of companies I’ve seen that decide to only monitor production resources. While this will save you some money, you won’t receive feedback until much later in the delivery pipeline. Another reason behind less feedback is that IT Ops folks don’t want to deal with alerts from non-production environments
In my experience, what works is to have different retention and alert policies for metrics data per environment. You don’t lose visibility, and by using tags, you can set policies like, “Don’t alert the on-call engineer for the non-production environment.” And this policy could also be accompanied by another one, like, “For the non-production environments, alert the developers while they’re in the office.”
Using tags for monitoring and alerting will help you to reduce the noise of monitoring and alerting.
To Discover Wasters
We talked a little bit about using tags to identify where you’re spending the money, but you could also spot where you’re overprovisioning resources in the cloud. When you’re moving to the cloud, it’s essential you know what you currently have. You need to discover and inventory your resources so that you know which type of and how many instances you’ll need. You don’t want to be creating support tickets to increase limits while you run the migration.
But it’s pretty common that we initially overprovision resources because we find that one CPU in the cloud is not the same as one CPU on-prem. Or we find that the cloud provider only offers you pre-defined configurations for CPU and memory—excluding Google.
By using tags, you’re able to know for which workloads you need to have enhanced networking or memory optimized instances. When you choose optimized instances for your needs, you might end up needing fewer servers.
To Apply Restriction Controls
After you’ve seen the benefits of using tags, you might want to make sure you don’t have untagged resources anymore. Well, you can enforce tags and their value every time a new resource is created. It’s always better to receive an error than to have to find out later who created the resource and why. Enforcing tags is not a well-supported feature in all cloud providers, meaning that you might not receive descriptive errors. For that reason, consider having a checklist document, templates, or pre-built catalogs.
Another use case around restriction is that for some resources you can grant permissions based on tags. For example, you could use a tag to identify which project the resource belongs to. Then, you create a policy permission to restrict access based on tags. When a user has these types of policies attached, it’s going to be impossible to have access to resources other than what the project requires. A developer for the consumer site won’t have access to backend services.
Using tags for restrictions is an easy way to create generic policy permissions.
Tags Are Your Infrastructure Metadata
Tags are a straightforward concept: a key/value pair that you use to identify a resource. But as I described in today’s post, tags can become your best ally when managing your cloud resources. Tags don’t get coupled with the applications you’re running in a server; you can always change their value to any existing infrastructure. You could still change your mind about your tag strategy, but that shouldn’t mean you need to reprovision or restart services.
In the past, a good naming convention was important. You knew a lot about a resource just by reading its name. It’s still essential to have a good naming convention, but sometimes just a name isn’t enough; you need to have more context and to reduce manual labor. Tags are a perfect complement when you want to leverage work on automation.
Don’t underestimate tags because of the simplicity of the concept; they’re a powerful tool.
This post was written by Christian Meléndez. A regular poster for Enov8, Christian is a technologist that started as a software developer and has more recently become a cloud architect focused on implementing continuous delivery pipelines with applications in several flavors, including .NET, Node.js, and Java, often using Docker containers.
19 MARCH, 2020 by Michiel Mulders SRE vs DevOps: Friends or Foes? Nowadays, there’s a lack of clarity about the difference between site reliability engineering (SRE) and development and operations (DevOps). There’s definitely an overlap between the roles, even though...
06 MARCH, 2020 by Arnab Roy Chowdhury Top 10 SRE Practices Do you know what the key to a successful website is? Well, you’re probably going to say that it’s quality coding. However, today, there’s one more aspect that we should consider. That’s reliability. There are...
20 FEBRUARY, 2020 by Arnab Row Chowdhury Technically, the world today has advanced to a level we never could’ve imagined a few years ago. What do you think made it possible? We now understand complexities. And how do you think that became possible? Literacy! Since...
14 FEBRUARY, 2020 by Michiel Mulders A site reliability engineer loves optimizing inefficient processes but also needs coding skills. He or she must have a deep understanding of the software to optimize processes. Therefore, we can say an SRE contributes directly to...
07 February, 2020 by Arnab Roy Chowdhury Do you remember what Uncle Ben said to young Peter Parker? “With great power comes great responsibility.” The same applies to companies. At present, businesses hold a huge amount of data—not only the data of a company but also...
17 JANUARY, 2020 by Sylvia Fronczak Site reliability engineering (SRE) uses techniques and approaches from software engineering to tackle reliability problems with a team’s operations and a site’s infrastructure. Knowing the history of SRE and understanding which...