The Complete Guide to Feature Flags

The term feature flags refers to a set of techniques that allow software developers and teams to change the behavior of their system in production without modifying or even deploying code.

Because of their ability to modify system behavior on the fly, feature flags are very powerful and versatile. They facilitate many use cases that boost developer productivity and make the user experience better and faster: gradually rolling out new features to users vs big-bang releases, testing in production, canary launches, experimentation and many more.

We’ll explore feature flag use cases later. But first, let’s examine feature flags in depth and see how they work.

What is a feature flag?

At the core of this amazing concept, lies a dead simple and basic foundation that uses conditional code (if statement) to determine whether to perform an action or not.

This is best explained with an example. I’ll use a real one that I worked on not too long ago. We wanted to allow our users to sign-in to our site using their Google account.

To do this, you first create a developer account with Google and get the OAuth code. The application must be approved by Google before you can use it on your site. Until then, it can only be used in test mode.

Traditionally, the code for Google sign-in will be kept in a separate Git branch. When the application is approved by Google, the branch can be merged into develop or main (aka master) so it can be released to users.

The challenge was that the verification process may take several weeks. I tested everything locally to make sure it was all working. But I couldn’t release it yet, even for internal users. If I merged into develop to deploy the feature on the development environment, it would also release it to production (since release branches were automatically cut off of develop.)

To summarize the issue: I wanted to release an unfinished feature to the internal team for feedback and dogfooding. I had no way to show it to the team without merging it to the develop branch.

I’d ask you to pause here and think how you’d solve this problem? Scroll up and read again if you just skimmed through.

The first solution that may come to mind is quite simple. Introduce a new variable, a config parameter e.g. showSignInButton or an environment variable e.g. IS_PRODUCTION. Use this in an conditional or if-else statement in the code to disable the feature in production but keep showing it on the development environment.

And voila! You have arrived at the (most basic) definition of feature flag.

To use feature flags in our code, there are three core things that we need:

Find the seam where you could hide or disable the feature. In the example I shared above,the seam was the login form container in React. I used feature flag to decide whether to render the sign-in button component or not.
Determine feature flag variation. This is usually done by calling a function called evaluate(), we’ll explore later, but for now all we need to know is that this function can return “on/off” or “true/false” using some logic.
Use the variation as a condition in the if-then-else statement to determine which block of code to execute e.g. the if, or the else block. In our example, “on” or “true” would call the sign in button component.

let variation = evaluate("google-sign-in-btn")

if (variation === "on") {
    // code to render Google sign-in component
} else {
    // don’t show it
}

While this is how feature flags generally work, a complete feature flag platform like Unlaunch is much more comprehensive, as we’ll see later.

Let’s take a deep dive into the internals of feature flags and how it all works, including the evaluation logic.

How feature flags work?

Anatomy of a feature flag

A feature flag itself is nothing but an object or a container that contains the following key properties:

Name or unique identifier: The name of the feature flag must be unique from all other flags in the scope.
Variations: A feature flag can be 2 or more variations (multivariate.) Variations are simply strings e.g. “on” or “off”. These are supposed to be used by developers in “if” statements to select the code path. In some feature flagging systems, this could also be “true” or “false”.
(Targeting) Rules: These take in context such as the user attributes from HTTP request to determine which variation to return. For example, you can set targeting rules to enable a feature by returning “on” variation for new users only.

There can be many more properties of a feature flag such as whether it is enabled or disabled, dynamic configuration etc.

To recap: A feature flag is an object that contains a bunch of properties such as variations, targeting rules etc.

You can think of a feature flag JSON object describing its properties and state.

The second central component of a feature flagging system is called the evaluator.

Evaluator

Evaluator is a method that takes as input the:

feature flag object (which as we learned in the last section contains all the properties,) and,
context such as the user id and attributes from the HTTP request

The output of the evaluator is always the variation that developers should use in their code to determine the code path. As we learned, variations are strings such as “on” or “off” that are used as conditions in an if statement.

The evaluator uses these two arguments to choose which variation to return. It knows how to evaluate the targeting rules.

Side note: If you’re a Java developer, read this post on how to integrate and use feature flags in Spring Boot applications.

History of Feature Flags

Facebook engineering and research have built a number of amazing products. Apache Cassandra, React, GraphQL are great examples.

While they didn’t invent this technique, Facebook certainly pioneered and made heavy use of feature flags internally.

Facebook is a massive system of interconnected backend services, databases, frontend, interfaces and more. As they grew big, they faced a challenge. How to release new features, changes and updates without breaking things.

In other words, how can the massive social networking site enable thousands of engineers to get their code out quickly to users in safe, small and incremental steps.

While a lot has gone into powering continuous delivery at scale at Facebook, pushing out an endless stream of changes every single hour, it was made possible by an internal tool called Gatekeeper.

What is Gatekeeper? You might have already guessed it but it’s Facebook’s internal system for managing feature flags. Jack Lindamood, former Facebook engineer, described it as:

if (gatekeeper_allowed('my_feature_name', $viewing_user_or_application)) { 
  run_this_tested_code(); 
} else { 
  run_this_old_code(); 
} 

From Facebook engineering blog:

If we do find a problem, we can simply switch the gatekeeper off rather than revert back to a previous version or fix forward.

This quasi-continuous release cycle comes with several advantages:

It eliminates the need for hotfixes.… It allows better support for a global engineering team.… all engineers everywhere in the world can develop and deliver their code when it makes sense for them. It makes the user experience better, faster.… when it takes days or weeks to see how code will behave, engineers may have already moved on to something new. With continuous delivery, engineers don’t have to wait a week or longer to get feedback about a change they made. They can learn more quickly what doesn’t work, and deliver small enhancements as soon as they are ready instead of waiting for the next big release. …

While Facebook adopted many practices to power continuous delivery at its massive scale, feature flags were fundamental to their approach.

Today, feature flags are widely used at large and medium size companies.

But are they useful just to super large organizations like Facebook or can small companies and startups can benefit from them as well? We’ll discuss this a little later in this post, but the short answer is yes.

What’s in a name? That which we call a rose…

Feature flags are known by several other names in the development community: feature toggles or feature flippers and perhaps a few more. For the remainder of this post, we’d stick to the term feature flag as it’s the most popular.

Feature Flag Use Cases

Earlier I mentioned that feature flags are very versatile and can be used to achieve a variety of tasks. They are useful not just for engineering teams, but equally useful to QA, Operations and Product teams.

Let’s recap the key point that we have learned about feature flags so far:

Feature flags allow developers to modify the behavior of their code at runtime. In other words, they give the ability to ship multiple code paths and choose between them at runtime.

That’s incredibly powerful and can be used in a variety of contexts to achieve many different goals.

Canary Releases and Gradual Roll outs: Rapid Releases at Scale

This is perhaps the most common use case of feature flags that I have encountered.

Traditionally, product features were launched as all-or-nothing. Also known as big-bang releases, this meant making new features available to all users at some cut-off point or the release date. These types of releases require a lot of testing prior to launch and if something goes wrong later, the developers have to patch or get hotfixes out to resolve the bug.

In other words, when the feature is ready, you let it rip.

In canary releases, instead of launching the feature to all your users at the same time, it is released to a small number of users initially. This allows reviewing results such as user sentiment and more concrete metrics like system load, performance etc. Once satisfied, the feature can be launched to a wider group of users.

Canary releases limit the blast radius. If issues are discovered, only a small number of users are impacted. Canary releases have been in use for a long time. Traditionally, they are achieved by having a separate cluster of servers or a replica of production environment. The load balancer sends a small number of users to the cluster where the new version of the application is deployed. For example 2% of the traffic goes to new code, rest goes to the existing production environment.

Canary releases using Blue Green Deployments

You can also achieve canary launches using feature flags. In fact, feature flags bring several improvements over traditional canary releases.

Canary clusters are typically controlled by DevOps or Operations teams. It requires jumping through extra hops and coordination if a developer needs to roll back something.
If you do need to roll back a feature, all features in the canary cluster will be rolled back. Including good features that other teams may have deployed.
While canary releases are better than all-or-nothing releases, there are still discrete jumps: 0% to 2% to 100% that may not be ideal for all use cases.
Not all teams have the budget or resources to manage a separate canary cluster.

Feature flags put canary releases on steroids. If I compare traditional canary clusters to monoliths, feature flag based ‘canary’ releases are like microservices. They are small, decentralized and put control back in the hands of teams and developers.

Feature flags put control right back into the hands of developers. Developers can launch their features when they want it, disable just the feature that is misbehaving without impacting other features. And best of all, feature flags do not require building a complex and expensive infrastructure to do canary releases.

They also enabled a wonderful use case that I personally love: Gradual or Percentage based roll outs. This allows teams to slowly ramp up traffic to their feature in any ‘continuous’ increments they want.

For example, when we switched over to a new search backend (ElasticSearch,) we initially sent 1% of the traffic to it. Later we increased it to 5% and monitored for several days. We kept increasing it gradually. At 60% traffic, we discovered several issues. We instantly disabled the feature with a click of a button, fixed the issue and resumed again.

I have also seen some teams let their PMs release (low risk) features such as a new blog post on their site, awards lists, etc. in coordination with marketing.

Alpha testing and dogfooding

Dogfooding, aka eating your own dog food or drinking your own champagne, is the practice of using your own products within your company. It’s a great way to not only test products in practical, real-world situations for finding bugs, but also to first hand experience your product like your users.

While dogfooding should be an on-going endeavor, it is especially important in the context of new features or major changes and provides an early opportunity for internal users to use the feature and provide meaningful feedback before opening the floodgates.

Feature flags are great for establishing a culture of dogfooding within the company. Release features to production but only for internal teams and allow them to be the first users.

Continuous Delivery

Continuous delivery is not a new concept. Over time, our field has evolved and created many methodologies such as agile software developers, continuous integration, continuous delivery to name a few, with the goal of delivering new changes to their users faster, safer and higher quality.

The typical git-flow goes like this:

Developers create separate feature branches to work on their features
When the feature is complete, it is merged into the develop branch. This usually results in the feature being deployed to the central development or QA environment.
The code is then merged into the master or release branch and released to production.

Git flow

While this was a great approach, it has a few challenges:

The feature branches can be long running. Even if the feature is complete, the team may have to wait for an external approval or another feature to launch before they can release their code.I have seen cases where feature branches have sit in isolation for weeks before they are merged into develop because they were dependent on the results of an AB Tests that hasn’t completed.
The feature cannot be released even internally until the feature branch is merged into the develop branch.
I have always held the belief that the best test environment is the production environment. I’m not saying everyone should test only on production, but it is the most accurate. It’s hard to emulate production traffic and patterns in test environments. Following git flow, because we can’t merge our features until ready, we can’t get them to production, even just for ourselves or the team.

Using feature flags, you can merge in-progress feature branches into develop or even main (release) branches. They allow unfinished features to be released to production. Not only this gets rid of the long running git branches, it also great for dogfooding, or showing off features to internal teams for testing or feedback.

For example, we used to regularly push almost-done features to production, enabling for internal users only by IP or email domain e.g. ‘@company.com’, and share links around (including the CEO and relevant stakeholders.) Everyone has access to the production environment and we instantly received feedback.

If you think about it, in git-flow, merging code is tightly coupled with releasing it to your users. When you merge to develop, it is released to everyone within the company. When you merge it to master, it is released to all users.

Feature flags provide a nice way of separating the two concerns: merging code is separated from when you release it. In other words, you can merge unfinished features all day long and only allow users to see it when ready.

AB Testing and Experimentation

Feature flags are great for running AB tests. You can define buckets (variations) and assign percentages to variations.

Product and marketing teams have been running AB tests for a long time. They are great for vetting new ideas quickly, and eliminating bad ones.

AB Testing using feature flags

While the situation is improving, I don’t see engineers running a lot of experiments for technical features that they implement. AB tests are usually pitched by product teams.

I don’t blame engineers entirely - AB testing is a complex business especially for big product related features and requires deep analysis (usually by analytics or data science teams across many KPIs.)

But not everything requires that level of in-depth analysis. Sometimes, all that’s needed to call an experiment is the count of errors across the test buckets and controls.

At my last two jobs, we successfully ran many light-weight, engineering focused experiments. Things like response time impact across variants, error counts, etc. were commonly experimented upon. Feature flags made it easy to run these and quickly analyze the results.

Kill switches and performance knobs

Feature flags can be used to implement kill-switches. These allow, usually DevOps or Operation teams, to gracefully degrade non-essential functionality when under load or in the event of outages. For example, a website may disable upload functionality for some or all users if there’s too much load on the backend systems or the system is under attack. Kill switches are related to another concept called the circuit breaker which enables or disables a certain code path automatically. It can be thought of as a manual circuit breaker.

Performance knobs are similar to kill switches but instead of completely disabling a feature, they are used for throttling or rate limiting a feature or API for a certain class of users. For example, under extreme load, you could limit access to new users but allow old users to continue as normal. Migrations especially those that require coordination

Feature flags can be very useful when performing migrations especially those that require coordination.

Feature Flags Lifespan and Dynamism

We explored many use cases of feature flags. In his extensive post on feature flags, Pete Hodgson, created a nice graph of types of feature flags that he described vs their lifespan.

Feature flags fall in two categories with respect to their lifespan: temporary vs long-living or even permanent. A feature flag you’re using to roll out a new feature is temporary. When the feature is 100% rolled out, the flag can be deleted. On the other hand, kill switch to control whether to show or hide a heavy widget on your website is a permanent flag that will always be there.

The graph below shows lifespan - how long feature flags stay active in code vs dynamism. Dynamism describes how dynamic a feature flag is. At one end of the spectrum, you have flags like kill switches that don’t care about any context. At the other end, you have feature flags that depend on per-request parameters like type of user and user attributes.

Feature flags lifespan. Temporary vs permanent flags

How to feature flags?

There are many ways to implement and use feature flags.

The easiest way is to use hard-coded if-else statements in your code. This approach is not very flexible and defeats the purpose of feature flags (changes to feature state such as enable, disable or gradual rollout need code deploys.)

Taking it one step further, you can make the flag condition configurable. Such as adding it to a database where you can change it on the fly.

let enableLatestWidget = db.read();

if (enableLatestWidget) {
  // show widget
} else {
  // don’t show it
}

While this is better than the first approach and certainly works, it’s very brittle.

You can’t do things like gradual roll outs, targeting users based on their IDs or attributes, experiment or look at KPIs. There’s no way to look at all the feature flags laid out on a nice dashboard and collaborate with your team in one place.

At least not without writing a lot of code yourself.

Taking it even a step further brings us to complete feature flag management tools like Unlaunch.

Comparing managing feature flags using a database vs feature flag tools is like comparing Excel spreadsheet to Jira (or Trello.) Sure, you can track tasks in Excel, but it’s much more efficient to use Jira.

Arguably the biggest benefit these mature feature flag tools provide is encouraging feature flags and experimentation best practices and establishing a culture organization wise.

These tools have low overhead. For server-side applications, flags are fetched when application starts (and then refreshed periodically.) All evaluations occur in-memory, introducing no additional latency.

Unlaunch is a complete feature flag management tool that I have been working on since last year. It is free for solo developers and small teams.

Cleaning up after yourself

This is important enough to warrant its own section.

You must remove feature flags from your code once they are no longer needed. If you launched a new feature using feature flags, after it is rolled out at 100% and there’s no risk (or you can’t disable it anymore,) go remove the feature flag code from your code. Search by the name of the flag in your code and delete the flag code and code path that is not used anymore.

Feature Flags & Startups

Feature flags are very common in large to medium sized organizations and have been around for many years.

But I have not seen a whole lot feature flags use at smaller companies or startups.

Which raises the question: can startups and smaller companies reap from using feature flags or is it just needless overhead?

The answer is that startups could and absolutely should use feature flags.

Startups operate at a much faster pace than large organizations, by their nature. Feature flags are an ideal vehicle to safely and reliably keep releasing new features to your users. By adopting feature flags early, startups can:

build a culture of safe and fast releases.
use feature flags for Alpha testing or dogfooding their products. This is great for teams who don’t have the budget or resources to maintain a separate QA or development environment to test or stage their features.

Last year I was consulting for a small startup that was building Chatbots. They didn’t have enough budget to build and maintain a separate QA environment to test things out before they release to their users. So we built a simple feature flags based system that allowed its developers to release features to production but only allow access to internal teams (by phone number.)

Are you using feature flags in your organization release new features faster and with confidence? Please comment below to share any tips, ideas or best practices.