DevOps versus SRE – Friend or Foe
by Michiel Mulders
SRE vs DevOps: Friends or Foes?
Nowadays, there’s a lack of clarity about the difference between site reliability engineering (SRE) and development and operations (DevOps). There’s definitely an overlap between the roles, even though there are clear distinctions. Where DevOps focuses on automation of deployments and tests, SRE focuses on post-deployment processes—for example, measuring the application’s performance or aggregating logs to make them easily discoverable.
First of all, allow me to explain DevOps. DevOps proposes to merge the development and operations teams. It means an end to just writing your code and throwing it over the wall for the operations team to deploy and test.
No matter the size of the organization, DevOps tries to align processes and improve communication between both teams. When a functionality is finished, then the developer helps the operations team test the code because they have a shared responsibility.
Besides that, DevOps helps improve the code’s quality. Every time a new functionality is complete, the team will use automated tools to build and test it. Doing this allows the team to find bugs earlier and provides the development team with faster feedback through automation.
In short, the DevOps culture aligns all involved stakeholders around a shared goal: the delivery of high-quality, stable software.
Now that you understand the mission and purpose of DevOps, let’s move on to SRE.
Here’s how Ben Treynor, the person who developed the SRE role at Google, defines a site reliability engineer:
“Fundamentally, it’s what happens when you ask a software engineer to design an operations function…So SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise, and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor.”
To summarize this, an SRE is a software engineer with a deep understanding of the code. SREs spend a lot of time writing code to improve processes and introduce automation. They’re a crucial part of the development team because they reduce the workload for many developers.
SREs spend the other 50% of their time monitoring the system’s health. An SRE observes the health of a system by measuring different metrics, such as availability and reliability. Therefore, the SRE’s work is key to application monitoring and logging.
Differences Between DevOps and SRE
Let’s take a look at some key differences in these roles so you can understand how they complement each other and serve the larger organization.
Difference #1: Background
DevOps: A DevOps engineer has a solid understanding of different system architecture types and is most often fluent with Unix- and Linux-based distributions. These engineers have a deep understanding of the deployment process and all involved elements.
SRE: A site reliability engineer is actually a software engineer who has a key understanding of deployed systems. The knowledge combination of software engineering and deployments makes them highly valued.
Next, let’s take a look at how DevOps and SRE differ in terms of measuring metrics.
Difference #2: Metrics
DevOps: Generally speaking, the mindset of DevOps isn’t much concerned with metrics. Instead, DevOps’ core goal is to automate development processes, including testing, deployments, and builds.
SRE: An SRE tracks metrics, such as availability and reliability of services. These metrics help an SRE to understand the system’s health. This means that SRE keeps track of data related to the system’s health through application monitoring tools. In addition, logging and log aggregation make up a big part of the data the SRE captures.
Difference #3: Cultural Shift
DevOps: The origin of DevOps reveals a cultural shift. DevOps wants to merge the operations and development teams. In the DevOps view, development should be more than writing code and throwing the code to the operations team to deploy. The idea is to create a more streamlined process and support a more agile way of working.
SRE: A site reliability engineer is a software engineer who takes care of processes related to scaling and measuring the system’s health. The SRE approach is more than just a cultural shift. Instead, SRE helps innovate processes to increase the efficiency of development and operations teams.
Difference #4: Automation
DevOps: DevOps helps provide faster feedback about code through a continuous integration (CI) pipeline. It takes care of test automation but also builds the software for certain platforms, creating artifacts. A CI tool can execute tests in parallel, which makes test execution much faster than doing so locally on an engineer’s laptop.
In short, DevOps is concerned with process automation related to the development and deployment phase.
SRE: SRE professionals focus on automating processes related to scaling and measuring the system’s health. This means they mostly focus on automating processes related to post-deployment. For example, an SRE professional might aggregate information in a dashboard about the server’s health, monitoring memory usage and CPU allocation. Also, as mentioned previously, application performance management helps the SRE aggregate logs and make them more discoverable.
In brief, an SRE collects data about running services so he or she can quickly solve issues in case of a system crash. Therefore, automation is also a key aspect to improve processes and automate these tasks.
Difference #5: Dealing With Bugs and Application Failures
DevOps: DevOps lacks a deep understanding of the code. In case of an error, the DevOps engineer can roll back the code to a previous minor version to restore service. The DevOps engineer wants to minimize the impact of an issue and restore service as soon as possible. After that, the development team should investigate and fix the issue. (This article describes ways your DevOps may be failing.)
SRE: In contrast, a site reliability engineer has a deep understanding of the software. This allows him or her to solve nasty bugs on the spot using the data he or she keeps track of. Tracking relevant data greatly eases the debugging process. It’s also possible to deploy SRE for solving more tricky bugs, including performance issues and memory leaks.
In case of a crash, an SRE can use the aggregated logs to play back the events that led up to the crash. Having this data readily available helps the SRE to solve those problems quickly.
In short, a DevOps engineer doesn’t have the required knowledge of the software to solve the problem, whereas an SRE can use his or her data to solve the problem.
Conclusion: DevOps and SRE Are Both Necessary
To summarize, DevOps is concerned with the automation of deployments and test automation, whereas SRE focuses on tasks after deployment. SRE works on the automation of tasks related to availability and the system’s health.
DevOps and SRE have many common elements. For example, both fields have an interest in automation. The main difference is in which stage they are active. DevOps focuses on the development and deployment phase. DevOps empowers developers and provides them with fast feedback about their code. In contrast, SRE is concerned with post-deployment automation and innovating related processes. In the end, your organization needs to understand there’s a difference between the roles.
Although both terms are often used interchangeably, for large organizations, it makes sense to have both DevOps and SRE. SREs play a critical role in making sure a service is available because they can quickly debug problems based on their knowledge of the software.
In smaller organizations, the same person or team often executes SRE and DevOps. Although this double duty is sometimes necessary, it can put too much stress on your operations team. Often, startups like to use software engineers who can help with DevOps, thereby creating an indirect SRE role.
In essence, a site reliability engineer should automate himself or herself. In other words, that person should try to automate as many processes as possible, almost going as far as automating themselves!
This post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!
06 MARCH, 2020 by Arnab Roy Chowdhury Top 10 SRE Practices Do you know what the key to a successful website is? Well, you’re probably going to say that it’s quality coding. However, today, there’s one more aspect that we should consider. That’s reliability. There are...
20 FEBRUARY, 2020 by Arnab Row Chowdhury Technically, the world today has advanced to a level we never could’ve imagined a few years ago. What do you think made it possible? We now understand complexities. And how do you think that became possible? Literacy! Since...
14 FEBRUARY, 2020 by Michiel Mulders A site reliability engineer loves optimizing inefficient processes but also needs coding skills. He or she must have a deep understanding of the software to optimize processes. Therefore, we can say an SRE contributes directly to...
07 February, 2020 by Arnab Roy Chowdhury Do you remember what Uncle Ben said to young Peter Parker? “With great power comes great responsibility.” The same applies to companies. At present, businesses hold a huge amount of data—not only the data of a company but also...
17 JANUARY, 2020 by Sylvia Fronczak Site reliability engineering (SRE) uses techniques and approaches from software engineering to tackle reliability problems with a team’s operations and a site’s infrastructure. Knowing the history of SRE and understanding which...
25 JANUARY, 2020 by Michiel Mulders With the cost of data breaches increasing every year, there’s a huge need for higher security standards. According to IBM’s 2019 security report, the average total cost of a data breach has risen to $3.92 million per breach. It’s no...