Environment Resilience – Hiring an SRE Team

May, 2020 by Eric Boersma
Taking on Site Reliability Engineering (SRE) is not an easy task. It doesn’t matter where you’re coming from. Some organizations have done a little DevOps and are trying to break into SRE. Others haven’t even taken that step, and figure they should go all the way with their implementation. Wherever you’re coming from, hiring an SRE team is a big undertaking. There are a lot of pitfalls along the way. With the correct planning, though, you and your team can navigate them successfully. Adding new Site Reliability Engineers to a team can be stressful. That’s true for both the company, and the employee. If you hire an SRE team, but you’re not prepared to support them, they’re going to have a bad experience. In turn, your experience of the process will be bad, too. In this blog post, we’ll talk about some things you can do before hiring an SRE team, while hiring an SRE team, and after hiring an SRE team to make it a smashing success.  

Before Hiring an SRE Team

There are a few steps you can take before hiring an SRE team or engineer that will make your process immeasurably smoother.

Know What Success Looks Like for the Team

You’d think that this goes without saying. However, you’d be surprised at the number of organizations I’ve worked with who don’t know what it takes for a new hire to be successful in a role. That kind of ambiguity makes it difficult to attract good talent, and to evaluate their performance. While you don’t want to have every minute of their first six months on the job planned out, you do want to have a good picture of how they’ll thrive. I try to answer the question “What would a really great first year from this person look like?” If I can’t answer that, I’m probably not ready to hire someone.

Know Your Existing Strengths and Weaknesses

Often times, when writing job descriptions, managers will simply copy and paste the list of requirements they had for the last time they hired someone. This is a bad idea! You don’t want to hire someone who’s going to have the same skills and shortcomings as the people already on your team. For instance, if your team already has a couple engineers and tools that knock Environment and Release Management out of the park, adding another person with skills in that area isn’t likely to improve the team. Instead, you might want to look to someone who has skills in Test Data Management or security. Adding that person to your team means you get stronger. Even if you’re trying to replace someone who’s leaving, you don’t want a line-for-line replacement for that person. If you’ve been doing your job as a manager, you should have been training other people up to match that person’s skills, so their job description wont’ be what you need any more. Instead, think hard about the skills you need but don’t have. Write a job description that emphasizes the skills you need, not the skills you already have.

While Hiring an SRE team

The process of hiring a new SRE team or engineer can be quite complicated. It’s easy to get lost in the weeds while hiring, and make decisions that you regret later on. Here are a couple things to keep in mind.

Filter for Good Incident Response

Incident response is a significant part of SRE, for both individuals and teams. Rapidly-expanding teams are more likely to experience breaks when new systems fail. That’s OK. That kind of uncertainty is built into the SRE mindset. What’s not OK are those breaks turning into significant outages. This is why it’s important to look for people who have experience responding to outages and incidents. SRE as a discipline is still relatively young. If you only hire for people who have years of experience in SRE, you’re going to be paying through the ears. Instead, you want to identify the kinds of skills that will make someone a valuable SRE engineer without necessarily requiring that they’ve done the job before. Being calm under pressure and able to respond quickly in a crisis are great skills for an Site Reliability Engineer to have.

Good Communication Is a Must

Again, this seems like it’s something that should be obvious when hiring new people. In today’s world, it’s not OK to be bad at communicating! However, there’s a misconception that goes around about tech folks. That misconception says that they’re all reclusive nerds who have difficulty communicating with people who don’t have the same background. That couldn’t be further from the truth. You should expect your Site Reliability Engineers to be just as good at communication as anyone else on your team. You need that good communication because when things break down, you need them to tell you what’s going on. If they’re good communicators, you’ll reduce your team’s overall stress levels. If they’re bad communicators, the opposite will be true.

After Hiring an SRE team

The hard work of building a team doesn’t end when you make a hire.

Regularly Checking In

SRE work, especially for a new team, can feel very stressful. Often times, a SRE engineer is trying to create a whole new system to support an existing process people are used to working with. This new system is going to be better. Both the engineer and the people they’re supporting know that. However, the reality is that getting to that new system requires taking some bruises along the way. That vision you had of a high-quality first year for your new hire? They’re almost certainly not going to cross every item off your list. There are going to be some failures along the way, and this can feel demoralizing. You can support your new employee by regularly checking in with them and providing feedback on their performance. Stress-free employees do better work than stressed employees. It’s your job to make help them manage that stress.

Keep Training Them Up

Another common mistake from teams that hire people to do SRE work is that they don’t know how to train them. Someone comes in, and they point them at a new system, and say “build that.” The new hires are expected to handle learning things all by themselves. Needless to say, this is not a tactic with a spotless track record. Instead, you should be working with your employees, at those regular check-ins, to find out what they don’t know yet. It’s likely that you’re going to put them in position to do things they don’t know how to do yet. If you’re communicating with them about those tasks, you’ll quickly learn what they need to learn. Work with them to make sure they have the resources they need to tackle those new challenges. That might be formal training, or it might be partnering with a more experienced employee. Either way, don’t let your SRE team stagnate.

Empower Your Team to Do Their Best Work

However big your SRE team, following these steps will put them into a position to succeed. Being intentional before, during and after the hiring process gives you the best chance to build a team that transforms your business. Once you’ve hired that team, you want to give them the best tools possible to do their job. That’s where something like Enov8 comes in. Even with talented engineers and a solid support structure, you’ll still need help to manage the myriad new environments you’ll build. Using the right tools makes that a lot easier.
Eric Boersma

This post was written by Eric Boersma. Eric is a software developer and development manager who’s done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he’s learned along the way, and he enjoys listening to and learning from others as well.

Relevant Articles

Self-Healing Applications

02NOVEMBER, 2022 by Sylvia Froncza Original March 11 2019An IT and Test Environment Perspective Traditionally, test environments have been difficult to manage. For one, data exists in unpredictable or unknown states. Additionally, various applications and services...

Data Operations: Defined and Explained

01NOVEMBER, 2022 by Justin Reynolds.Businesses across the board are spinning their tires when it comes to data and analytics, with many of them failing to unlock maximum value from their investments. According to one study, 89% of companies face challenges around how...

Software Security Anti-Patterns

02NOVEMBER, 2022 by Eric Boersma *Original 22 October 2019If you're like a lot of developers, you might not think much about software security. Sure, you hash your users' passwords before they're stored in your database. You don't return sensitive information in error...

What makes a good Test Environment Manager?

14 OCTOBER 2022 by Daniel de OliveiraIn today’s application-based world, companies are releasing more applications than ever before. Software delivery life cycles are becoming more complicated. As a result, large companies require hundreds and even thousands of test...

Staging Server Success: The Essential Guide To Setup and Use

01NOVEMBER, 2022 by EricStaging Server Success: The Essential Guide To Setup and Use Release issues happen.  Maybe it's a new regression you didn't catch in QA. Sometimes it's a failed deploy. Or, it might even be an unexpected hardware conflict.  How do you catch...

What makes a good Test Data Manager?

19 NOVEMBER, 2020 by Michiel Mulders What Makes a Good Test Data Manager? Have you implemented test data management at your organization? It will surely benefit you if your organization processes critical or sensitive business data. The importance of test data is...