DevOps versus SRE – Friend or Foe

19
MARCH, 2020by Michiel Mulders

SRE vs DevOps: Friends or Foes?

Nowadays, there’s a lack of clarity about the difference between site reliability engineering (SRE) and development and operations (DevOps). There’s definitely an overlap between the roles, even though there are clear distinctions. Where DevOps focuses on automation of deployments and tests, SRE focuses on post-deployment processes—for example, measuring the application’s performance or aggregating logs to make them easily discoverable.Do you want to know more about the difference between DevOps and SRE? Let’s first explore both definitions in more depth before comparing these roles. 

What’s DevOps?

First of all, allow me to explain DevOps. DevOps proposes to merge the development and operations teams. It means an end to just writing your code and throwing it over the wall for the operations team to deploy and test.No matter the size of the organization, DevOps tries to align processes and improve communication between both teams. When a functionality is finished, then the developer helps the operations team test the code because they have a shared responsibility.Besides that, DevOps helps improve the code’s quality. Every time a new functionality is complete, the team will use automated tools to build and test it. Doing this allows the team to find bugs earlier and provides the development team with faster feedback through automation.In short, the DevOps culture aligns all involved stakeholders around a shared goal: the delivery of high-quality, stable software.Now that you understand the mission and purpose of DevOps, let’s move on to SRE.

What’s SRE?

Here’s how Ben Treynor, the person who developed the SRE role at Google, defines a site reliability engineer:“Fundamentally, it’s what happens when you ask a software engineer to design an operations function…So SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise, and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor.”To summarize this, an SRE is a software engineer with a deep understanding of the code. SREs spend a lot of time writing code to improve processes and introduce automation. They’re a crucial part of the development team because they reduce the workload for many developers.SREs spend the other 50% of their time monitoring the system’s health. An SRE observes the health of a system by measuring different metrics, such as availability and reliability. Therefore, the SRE’s work is key to application monitoring and logging.

Differences Between DevOps and SRE

Let’s take a look at some key differences in these roles so you can understand how they complement each other and serve the larger organization.

Difference #1: Background

DevOps: A DevOps engineer has a solid understanding of different system architecture types and is most often fluent with Unix- and Linux-based distributions. These engineers have a deep understanding of the deployment process and all involved elements.SRE: A site reliability engineer is actually a software engineer who has a key understanding of deployed systems. The knowledge combination of software engineering and deployments makes them highly valued.Next, let’s take a look at how DevOps and SRE differ in terms of measuring metrics.

Difference #2: Metrics

DevOps: Generally speaking, the mindset of DevOps isn’t much concerned with metrics. Instead, DevOps’ core goal is to automate development processes, including testing, deployments, and builds.SRE: An SRE tracks metrics, such as availability and reliability of services. These metrics help an SRE to understand the system’s health. This means that SRE keeps track of data related to the system’s health through application monitoring tools. In addition, logging and log aggregation make up a big part of the data the SRE captures.

Difference #3: Cultural Shift

DevOps: The origin of DevOps reveals a cultural shift. DevOps wants to merge the operations and development teams. In the DevOps view, development should be more than writing code and throwing the code to the operations team to deploy. The idea is to create a more streamlined process and support a more agile way of working.SRE: A site reliability engineer is a software engineer who takes care of processes related to scaling and measuring the system’s health. The SRE approach is more than just a cultural shift. Instead, SRE helps innovate processes to increase the efficiency of development and operations teams.

Difference #4: Automation

DevOps: DevOps helps provide faster feedback about code through a continuous integration (CI) pipeline. It takes care of test automation but also builds the software for certain platforms, creating artifacts. A CI tool can execute tests in parallel, which makes test execution much faster than doing so locally on an engineer’s laptop.In short, DevOps is concerned with process automation related to the development and deployment phase.SRE: SRE professionals focus on automating processes related to scaling and measuring the system’s health. This means they mostly focus on automating processes related to post-deployment. For example, an SRE professional might aggregate information in a dashboard about the server’s health, monitoring memory usage and CPU allocation. Also, as mentioned previously, application performance management helps the SRE aggregate logs and make them more discoverable.In brief, an SRE collects data about running services so he or she can quickly solve issues in case of a system crash. Therefore, automation is also a key aspect to improve processes and automate these tasks.

Difference #5: Dealing With Bugs and Application Failures

DevOps: DevOps lacks a deep understanding of the code. In case of an error, the DevOps engineer can roll back the code to a previous minor version to restore service. The DevOps engineer wants to minimize the impact of an issue and restore service as soon as possible. After that, the development team should investigate and fix the issue. (This article describes ways your DevOps may be failing.)SRE: In contrast, a site reliability engineer has a deep understanding of the software. This allows him or her to solve nasty bugs on the spot using the data he or she keeps track of. Tracking relevant data greatly eases the debugging process. It’s also possible to deploy SRE for solving more tricky bugs, including performance issues and memory leaks.In case of a crash, an SRE can use the aggregated logs to play back the events that led up to the crash. Having this data readily available helps the SRE to solve those problems quickly.In short, a DevOps engineer doesn’t have the required knowledge of the software to solve the problem, whereas an SRE can use his or her data to solve the problem.

Conclusion: DevOps and SRE Are Both Necessary

To summarize, DevOps is concerned with the automation of deployments and test automation, whereas SRE focuses on tasks after deployment. SRE works on the automation of tasks related to availability and the system’s health.DevOps and SRE have many common elements. For example, both fields have an interest in automation. The main difference is in which stage they are active. DevOps focuses on the development and deployment phase. DevOps empowers developers and provides them with fast feedback about their code. In contrast, SRE is concerned with post-deployment automation and innovating related processes. In the end, your organization needs to understand there’s a difference between the roles.Although both terms are often used interchangeably, for large organizations, it makes sense to have both DevOps and SRE. SREs play a critical role in making sure a service is available because they can quickly debug problems based on their knowledge of the software.In smaller organizations, the same person or team often executes SRE and DevOps. Although this double duty is sometimes necessary, it can put too much stress on your operations team. Often, startups like to use software engineers who can help with DevOps, thereby creating an indirect SRE role.In essence, a site reliability engineer should automate himself or herself. In other words, that person should try to automate as many processes as possible, almost going as far as automating themselves!
Michiel MuldersThis post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!

Relevant Articles

8 DevOps Anti-Patterns to Avoid

8 DevOps Anti-Patterns to Avoid

It’s the normal case with software buzzwords that people focus so much on what something is that they forget what it is not. DevOps is no exception. To truly embrace DevOps and cherish what it is, it’s important to comprehend what it isn’t. A plethora...

An Introduction to Application Rationalization

An Introduction to Application Rationalization

In today's fast-paced digital landscape, organizations often find themselves grappling with a sprawling array of applications. While these applications are crucial for various business operations, the lack of a structured approach to managing them can lead to...

What Makes a Great Test Data Management Tool

What Makes a Great Test Data Management Tool

What Makes a Great Test Data Management Tool? In today's fast-paced IT landscape, having a robust Test Data Management (TDM) tool is crucial for ensuring quality, compliance, and efficiency in software development and testing. At Enov8, we pride ourselves on providing...

The Top Application Portfolio Management Tools

The Top Application Portfolio Management Tools

Managing an application portfolio is essential for organizations aiming to optimize their IT operations, reduce costs, and enhance overall efficiency. Application Portfolio Management (APM) tools are designed to help organizations achieve these goals by providing a...

What Is a Test Data Manager?

What Is a Test Data Manager?

Testing is a critical aspect of software development, and it requires the use of appropriate test data to ensure that the software performs optimally. Test data management (TDM) is the process of creating, storing, and managing test data to ensure its quality,...

Sprint Scheduling: A Guide to Your Agile Calendar

Sprint Scheduling: A Guide to Your Agile Calendar

Agile sprints can be a powerful, productive and collaborative event if managed properly. However, when neglected or set up incorrectly they risk becoming chaotic and inefficient. Crafting an effective schedule for your sprint is essential to ensure the success of your...