It is no wonder, that the highest performing companies in the IT industry are also the ones who push the most delivery responsibility on the engineers as a contrast to the classical chain of handovers for ensuring “segregation of duties”. There is an inverse proportional relationship between how much segregation of duties a company enforces and their efficiency in terms of delivery lead-time. This is especially seen in regulated companies like banks, where they are under certain regulations and a failed deployment or a security flaw can have bigger consequences.
There is some sense behind this which is why these companies are doing this, but I propose a better way that can make most of the company be as agile as startups while only limiting the mission-critical systems to be limited by a more rigid delivery procedure involving multiple chains of approvals making continuous delivery inconvenient.
It starts with a risk analysis
First, you should start by doing a risk analysis of your system to identify the critical parts, which helps you determine which delivery process would be appropriate for the different system parts. The parts that don’t have a big consequence in case of failure, should be able to be deployed many times daily just like in an unregulated company. These parts include the front end apps (excluding login pages) and non-critical back-end code. You want to make sure that these parts are decoupled from the more risky parts, which will enable you to jump right into implementing continuous delivery.
Let’s consider an online banking application for example. What is the worst case scenario, we should mitigate against? Probably:
- Flawed payments
- Incorrect balances on accounts
- Personal data leak
By knowing this, you want to do extra due diligence when doing changes to these parts of the system, which might need extra segregations of duties and approvals. Not only for quality control, which it normally is not doing much for (when a non-technical manager approves a code change they don’t understand). It is more in terms of governance for the company, enabling the board and senior management to control changes to these areas and making sure a manager is responsible.
That being said, normally the rate of errors decreases and mean time to recovery improves by practicing continuous delivery, so even doing this for the mission-critical systems should make it less risky than “big bulk releases”every month. Still, it would be a hard political battle to get this way of working adopted for the mission-critical areas, as it is conflicting with core principles in many highly regulated companies.
The right tool for the right job
I tend to hear again and again a wish for everything in businesses to become “streamlined“. This involves design, technologies, and processes. The idea behind this is good, but I don’t think it is worth it to sacrifice efficiency in a team because some other team is doing something. Teams have, just like software services, a contract with each other that enable them to work together but should not be overly dependent on each other’s internals if it is not part of their contract. For that reason, if one team benefit from using the tool a and process a, then it is perfectly okay for another team to use tool b and process if they are not dependent on the each other’s tools and processes. Too much coupling between teams, both communication, and technology-wise, is often causing teams to block each other. That is why a lot of companies adopt feature teams, having the people grouped by the features they work on.
Split the risky from the non-risky
I recommend that the critical systems are decoupled in terms of the development teams and the servers hosting them. These critical systems are the only place where you should enforce strict classical segregation of duties and manager approvals for each release.
The other systems should be hosted on their own infrastructure and can be developed and deployed independently with modern extreme programming (XP) practices and continuous delivery practices such as:
- All code is covered by automated tests.
- Code can go to production once it has passed a review by another peer
- There are no restrictions on when you can deploy. No “freeze” periods
This will enable most of a regulated business to deploy to production daily while limiting the necessary strict procedure to where it is necessary.