A significant share of architectural energy is spent on reducing or avoiding lock-in. That’s a rather noble objective: architecture is meant to give us options and lock-in does the opposite. However, lock-in isn’t a simple true-or-false matter: avoiding being locked into one aspect often locks you into another. Also, popular notions, such as open source automagically eliminating lock-in, turn out to be not entirely true. Time to have a closer look at lock-in, so you don’t get locked up into avoiding it!
One of an architect’s major objectives is to create options. Those options make systems change-tolerant, so we can defer decisions until more information becomes available or react to unforeseen events. Lock-in does the opposite: it makes switching from one solution to another difficult. Many architects may therefore consider it their archenemy while they view themselves as the guardians of the free world of IT systems where components are replaced and interconnected at will.
Lock-in – an architect’s archenemy?
But architecture is rarely that simple – it’s a business of trade-offs. Experienced architects know that there’s more behind lock-in than proclaiming that it must be avoided. Lock-in has many facets and can even be the favored solution. So, let’s get in the Architect Elevator to have a closer look at lock-in.
The platforms we are deploying software on these days are becoming ever more powerful – modern cloud platforms not only tell us whether our photo shows a puppy or a muffin, they also compile our code, deploy it, configure the necessary infrastructure, and store our data.
This great convenience and productivity booster also brings a whole new form of lock-in. Hybrid/multi-cloud setups, which seem to attract many architects’ attention these days, are a good example of the kind of things you’ll have to think of when dealing with lock-in. Let’s say you have an application that you’d like to deploy to the cloud. Easy enough to do, but from an architect’s point of view, there are many choices and even more trade-offs, especially related to lock-in.
You might want to deploy your application in containers. That sounds good, but should you use AWS’ Elastic Container Service (ECS) to run them? After all, it’s proprietary to Amazon’s cloud. Prefer Kubernetes? It’s open source and runs on most environments, including on premises. Problem solved? Not quite – now you are tied to Kubernetes – think of all those precious YAML files! So you traded one lock-in for another, didn’t you? And if you use a managed Kubernetes services such as Google’s GKE or Amazon’s EKS, you may also be tied to a specific version of Kubernetes and proprietary extensions.
If you need your software to run on premises, you could also opt for AWS Outposts, so you do have some options. But that again is proprietary. It integrates with VMWare, which you are likely already locked into, so does it really make a difference? Google’s equivalent, freshly minted Anthos, is built from open-source components, but nevertheless a proprietary offering: you can move applications to different clouds – as long as you keep using Anthos. Now that’s the very definition of lock-in, isn’t it?
Alternatively, if you neatly separate your deployment automation from your application run-time, doesn’t that make it fairly easy to switch infrastructure, reducing the effect of all that lock-in? Hey, there are even cross-platform infrastructure-as-code tools. Aren’t those supposed to make these concerns go away altogether?
For your storage needs, how about AWS S3? Other cloud providers offer S3-compatible APIs, so can S3 be considered multi-cloud compatible and lock-in free, even though it’s proprietary? You could also wrap all your data access behind an abstraction layer and thus localize any dependency. Is that a good idea?
It looks like avoiding lock-in isn’t quite so easy and might even get you locked up into trying to escape from it. To highlight that cloud architecture is fun nevertheless, I defer to Simon Wardley’s take on hybrid cloud.
Lock-in isn’t an all-or-nothing affair.
Elevator Architects (those who ride the Architect Elevator up and down) see shades of gray where many only see black and white. When thinking about system design, they realize that common attributes like lock-in or coupling aren’t binary. Two systems aren’t just coupled or decoupled just like you aren’t simply locked into a product or not. Both properties have many nuances. For example, lock-in breaks down into numerous dimensions:
Open source software isn’t a magic cure for lock-in.
In summary, lock-in is far from an all-or-nothing affair, so understanding the different flavors can help you make more conscious architecture decisions. The list also debunks common myths, such as using open source source software magically eliminating lock-in. Open source can reduce vendor lock-in, but most of the other types of lock-in remain. This doesn’t mean open source is bad, but it isn’t a magic cure for lock-in.
Experienced architects not only see more shades of gray, they also practice good decision discipline. That’s important because we are much worse decision makers than we commonly like to believe – a quick read of Kahneman’s Thinking, Fast and Slow is in order if you have any doubt.
One of the most effective ways to improve your decision making is to use models. Even, or especially, simple models are surprisingly effective at improving decision making:
Simple but evocative models are the signature of the great scientist, but over-elaboration and over-parameterization is often the mark of mediocrity.
That’s why you shouldn’t laugh at the famed two-by-two matrix that’s so beloved by management consultants. It’s one of the simplest and therefore most effective models as we shall soon discover.
The more uncertain the environment, the more structured models can help you make better decisions.
There’s a second important point about models: a common belief tells us that in face of uncertainty you pretty much have to “shoot from the hip” – after all everything is in flux, anyway. The opposite is actually true: our generally poor decision making only gets worse when we have to deal with many interdependencies, high degrees of uncertainty, and small probabilities. Therefore, this is where models help the most to bring much needed structure and discipline into our decision-making. Deciding on whether and to what degree to accept lock-in falls well into this category, so let’s use some models.
A simple model can help us get past the “lock-in = bad” stigma. First, we have to realize that it’s difficult to not be locked into anything, so some amount of lock-in is inevitable. Second, we may happily accept some amount of lock-in if we get a commensurate pay-off, for example in form of a unique feature or utility that’s not offered by competitive products.
Let’s express these factors in a very simple model – a two-by-two matrix:
The matrix outlines our choices along the following axes:
We can now consider each of the four quadrants:
While the model is admittedly simple, placing your software (and perhaps hardware) components into this matrix is a worthwhile exercise. It not only visualizes your exposure but also communicates your decisions well to a variety of stakeholders.
For an every-day example of the four quadrants, you may have decided to use following items, which give you varying amounts of lock-in and utility (counter-clockwise from top-right):
A unique product feature doesn’t always translate into unique utility for you.
One word of caution on the unique utility: every vendor is going to give you some form of unique feature – that’s how they differentiate. However, what counts here is whether that feature translates into a concrete and unique value for you and your organization. For example, some cloud providers run Billion-user services over their amazing global network. That’s impressive and unique, but unlikely to be a utility for the average enterprise who’s quite happy to serve 1 million customers and may be restricted to doing business in a single country. Some people still buy Ferraris in small countries with strict speed limits, so apparently not all decision making is entirely rational, but perhaps a Ferrari gives you utility in more ways than a cloud platform can.
Because this simple matrix was so useful, let’s do another one. The previous matrix treats switching cost as a single element (or dimension). A good architect can see that it breaks down into two dimensions:
The matrix differentiates between the cost of making the switch from the likelihood that you’ll have (or want) to make the switch. Things that have a low likelihood and a low cost shouldn’t bother you much while the opposite end, the ones with high switching cost and a high chance of switch, are no good and should be addressed. On the other diagonal, you are taking your chances on those options that will cost you, but are unlikely to occur – that’s where you’ll want to buy some insurance, for example by limiting the scope of change or by padding your maintenance budget. You could also accept the risk – how often would you really need to migrate off Oracle onto DB2, or vice versa? Lastly, if switches are likely but cheap, you achieved agility – you embrace change and designed your system for low cost of executing it. Oddly, this quadrant often gets less attention than the top left despite many small changes adding up quickly. That’s our poor decision making at work: the unlikely drama gets more attention because what if!
When discussing the likelihood of lock-in, you’ll want to consider a variety of scenarios that’ll make you switch: a vendor may go out of business, raise prices, or may no longer be able to support your scale or functional needs. Interestingly, the desire to reduce lock-in sometimes comes in form of a negotiation tool: when negotiating license renewals you can hint your vendor that you architected your system such that switching away from their product is realistic and inexpensive. This may help you negotiate a lower price because you’ve communicated that your BATNA – your Best Alternative To a Negotiated Agreement is low. This is an architecture option that’s not really meant to be used – it’s a deterrent, sort of like a stockpile of weapons in a cold war. You might be able to fake it and not actually reduce lock-in, but you better be a good poker player in case the vendor calls your bluff, e.g. by chatting with your developers at the water cooler.
Pulling in our options analogy from the very beginning once more, if avoiding lock-in gives you options, then the cost of making the switch is the option’s strike price: it’s how much you pay to execute the option. The lower the switching cost you want to achieve, the higher is the option’s value and therefore the price. While we’d dream of having all systems in the “green boxes” with minimal switching cost, the necessary invest may not actually pay off.
Minimizing switching costs may not be the most economical choice.
For example, many architects favor not being locked into a database vendor or cloud provider. However, how likely is a switch really? Maybe 5%, or even lower? How much will it cost you to bring that switching cost down from let’s say $50,000 (for a semi-manual migration) to near zero? Likely a lot more than the $2,500 ($50,000 x 5%) you can expect to save. Therefore, minimizing the switching cost isn’t the sole goal and can easily lead to over-invest. It’s the equivalent of being over-insured: paying a huge premium to bring the deductible down to zero may give you peace of mind, but it’s often not the most economical, and therefore, rational, choice.
A final model (for once not a matrix) can help you decide how much you should invest into reducing the cost of making a switch. The following diagram shows your liability, defined as the product of switching cost times the likelihood that it occurs in relation to the up-front invest you need to make (blue line).
By investing in options, you can surely reduce your liability, either by reducing the likelihood of a switch or by reducing the cost of executing it. For example, using an Object-relational Mapping (ORM) framework like Hibernate is a small investment that can reduce database vendor lock-in. You could also create a meta-language that is translated into each database vendor’s native stored procedure syntax. It’ll allow you to fully exploit the database’s performance without being dependent, but it’s going to take a lot of up-front effort for a relatively unlikely scenario.
The interesting function therefore is the red line, the one that adds the up-front invest to the potential liability. That’s your total cost and the thing you should be minimizing. In most cases, with increasing up-front invest, you’ll move towards an optimum range. Additional investment into reducing lock-in actually leads to higher total cost. The reason is simple: the returns on investment diminish, especially for switches that carry a small probability. If we make our architecture ever-so-flexible, we are likely stuck in this zone of over-investment. The Yagni (you ain’t gonna need it) folks may aim for the other end of the spectrum – as so often, the trick is to find the happy medium.
Now that we have a pretty good grip on the costs and potential pay-offs of being locked in, we need to have a closer look at the total cost of avoiding lock-in. In the previous model we assumed that avoiding lock-in is a simple cost. In reality, though, this cost can be broken down into several components:
Complexity can be the biggest price you pay for reducing lock-in.
When calculating the cost of avoiding lock-in, an architect should make a quick run down this list to avoid blind spots. Also, be aware that attempts at avoiding lock-in can be leaky, very much like leaky abstractions. For example, Terraform is a fine tool, but its scripts use many vendor-specific constructs. Implementation details thus “leak” through, rendering the switching cost from one cloud to another decidedly non-zero.
With so much theory, let’s look at a few concrete examples.
I worked with a company who packages much of their code into Docker containers that they deploy to AWS ECS. Thus they are locked into AWS. Should they invest into replacing their container orchestration with Kubernetes, which is open source? Given that feature velocity is their main concern and the current ECS solution works well for them, I don’t think a migration would pay off. The likelihood of having to switch to another cloud provider is low and they have “bigger fish to fry”.
Recommendation: accept lock-in.
Many applications use a relational database that can be provided by numerous vendors and open source alternatives. However, SQL dialects, stored procedures, and bespoke management consoles all contribute to database lock-in. How much should you invest into avoiding this lock-in? For most languages and run-times common mapping frameworks such as Hibernate provide some level of database neutrality at a low cost. If you want to further minimize your strike price, you’d also need to avoid SQL functions and stored procedures, which may make your product less performant or require you to spend more on hardware.
Recommendation: use low-effort mechanisms to reduce lock-in. Don’t aim for zero switching cost.
Rather than switching from one database vendor to another, you may be more interested in moving your application, including its database, to the cloud. Besides technical considerations, you’ll need to be careful with some vendors’ licensing agreements that may make such a move uneconomical. In these cases, it’s wise to opt for an open source database.
Recommendation: select an open source database if it can meet your operational and support needs, but accept some degree of lock-in.
Many enterprises are fascinated the idea of portable multi-cloud deployments and come up with ever more elaborate and complex (and expensive) plans that’ll ostensibly keep them free of cloud provider lock-in. However, most of these approaches negate the very reason you’d want to go to the cloud: low friction and the ability to use hosted services like storage or databases.
Recommendation: Exercise caution. Read my article on multi-cloud.
It may seem that one can put an enormous amount of time contemplating lock-in. Some may even dismiss our approach as “academic”, a word which I repeatedly fail to see as something bad because that’s where most of us got our education. Still, isn’t the old black-or-white method of architecture simpler and, perhaps, more efficient?
Architectural thinking is actually surprisingly fast if you focus and stick to simple models.
In reality thinking actually happens extremely fast. Running through all the models shown in this article may really just take a few minutes and yields well-documented decisions. No fancy tooling besides a piece of paper or a whiteboard is required. The key ingredient into fast architectural thinking is merely the ability to focus.
Compare that to the effort to prepare elaborate slide decks for lengthy steering committee meetings that are scheduled many weeks in advance and usually don’t have anyone attend who has the actual expertise to make an informed decision