Continuous Integration as a service (Travis CI, CircleCI, and plenty of others) has been commonplace for a while. These services are widely used to validate proposed changes. However, there are far fewer examples of using cloud-based continuous integration tools to also do continuous deployment.

This is exactly the question I was asked recently. Instead of having separate integration and deployment tools, some of which we purchase as a service and others my team operates internally, would it be possible to use a single tool for both purposes? Can PagerDuty have full workflow automation, a unified user interface, and the simplicity of having a single partner run the entire stack?

Pondering this question requires agreeing on the basics of what continuous integration and deployment actually mean, looking at the similarities and differences between them, and considering the issues we can run into on the boundary between the two.

All of this discussion is presented below, starting from the questions of “What exactly do continuous integration and deployment mean to me?” and “Where does one stop and the other begin?”

Continuous integration

The process of continuous integration (CI) has two goals. The first goal is to validate every proposed code change by building the product and running a series of tests to verify its functionality. When every change is so validated changes can be confidently merged, or integrated, into a shared codebase without breaking the product. This is the traditional goal of CI as described by books, Wikipedia, and so on.

The second goal optimizes for future deployment. Since the first goal has to build the product in order to test it, the process can use the build results to generate everything needed to deploy the product in the future. Let’s call it an “artifact”. Preparing an artifact in advance saves time during deployment, and artifacts can remain available even if the source code that generated them no longer exists (for example, if the branch has been force-pushed on).

Both validation and packaging should be done on the shared codebase (the mainline) and on the code in development (feature branches). Developers may want to roll out the code from feature branches to test environments to quickly check things out; code may break following a merge because of changes that happened in-between even if feature branch tests were successful; and production deploys usually must come from the mainline.

Continuous deployment

The goal of continuous deployment (CD) is to deploy a newly available version of the product without requiring human toil or causing an outage. In its simplest implementation, this can be done by SSHing into a host and installing a newer version of a package. In a distributed highly-available system, a deployment workflow could involve gradually rolling out the new version across a fleet of machines while watching the error rates and, if a problem develops, returning the entire system to its previous good state.

While continuous deployment can be defined as only into production and only from the mainline, there are benefits to configuring CD to operate on non-production environments and feature branches as well. Nobody likes doing extra work, and when a change pushed to source control is automatically deployed shortly after in an environment of developer’s choice this lets the developer focus on the work.

Non-production use requires more flexibility. A developer may want to deploy from any point of the source tree, make frequent manual changes, and “freeze” a particular environment to not pick up any new changes until her investigation is complete. In a development workflow flexible delivery is the ideal, not continuous deployment.

When useful artifacts are produced and stored during CI, the delivery process speeds up. In fact, it may be possible for a CD process to never look at the source code repository at all, if the deployment configuration is found elsewhere or is packaged as part of the artifact.

Differences between continuous integration and deployment

On the surface, both CI and CD look quite similar: either process can be represented as a workflow (“pipeline”) of arbitrary commands executed in a shared, state-preserving environment. Specific steps should be done in sequence, some steps can branch out to be run in parallel while others can only proceed when some or all of the previous steps have completed. It is tempting to try to use the same tool to represent both the processes as a single seamless workflow. There are, however, several important differences to keep in mind.

The most important difference is isolation. A CI process should not have any side effects on any production or development environment. CI tests and packaging should run in a self-contained manner. Isolation is the reason why multiple CI jobs for the same product, on different branches and code states, can run concurrently without interference. Isolation is a crucial requirement for projects where heavy development happens on a shared codebase.

A CD process, on the contrary, is pointless if it does not have the “side effect” of deploying a new product version in a given environment. Therefore, for that environment, multiple CD runs cannot run concurrently, unlike CI processes.

There is another difference in usage that arises from the isolation/deployment distinction. It generally does not make sense to trigger a CI process for a given version of the source again, since the result is likely to be identical. However, deploy processes for a given source version are often triggered multiple times. The common scenario is different deploy targets, such as a developer’s testing environment and production. But even if the target is the same, a deploy can be repeated if it is a rollback from a bad deploy of a later version, or when a developer steps out to lunch and does not want to leave a buggy version in a testing environment used by multiple people.

The next difference I want to talk about is time. CI should run as fast as possible, with unit test times on the order of seconds being the best practice. There are no intentional delays in a CI process since the goal is to validate the source as quickly as possible. But a well-built gradual CD process often includes intentional delays, for collecting data on the performance and error rate of the newly deployed version before deciding to either roll back or increase the deployment from a small percentage (“canary”) of the fleet to the entirety of it. Queuing or waiting for a lock also can take considerable time, much longer than would be reasonable for a CI process.

Finally, CI and CD have different ideal responses to resource constraints. Both CI and CD systems have limited concurrency, either by operational constraints of owned systems (number of available build agents) or by contractual limitations of third-party services (up to 100 concurrent containers in use). What should be the perfect response of these systems when new runs are queued but there are no resources available to launch them?

On the mainline, I would like to run CI on all commits, preferably most recent ones first, with backfilling for older commits when spare resources are available to find out if any breakage (even a transient one) has happened and to bisect any breakage to the individual code change. Since CD typically takes quite a bit longer than CI, deploy processes on busy codebases tend to batch all changes available since the last deploy and process them together. If the deploy is successful, it does not make sense to backfill by deploying older versions.

On a development branch used by a single developer or a small team to iterate, my preferred behavior for CI would be to prefer the most recent commit to the extent of aborting a build for an older change already in progress to get rapid feedback on the newest change (for example, if a bug is noticed and fixed by the developer before a previous CI process completes). For a CD process, aborting in the middle could leave an environment in a broken state, so once started a deploy should follow through.

How to solve the CI-CD differences in a single product

The migration pattern I have seen is for a CI system to be adopted to perform some CD tasks and to add CD-related features, so this section assumes that a system is feature complete for CI and wants to add robust support for delivery and continuous deployment by addressing all of the points raised above.

Serialization of continuous deployment requires the system to have a queuing capability once the workflow transitions from CI to CD. This can be implemented via an explicit queue or via locking and an implicit queue for the lock. There should be a separate queue per deployment target, not just per service/repository.

The use case of re-triggering a previously performed delivery process requires both a user interface and an API for launching the deploy part of the workflow with the necessary arguments, the most important of which is the deployment target. In practice, other configurable inputs to the deploy process may need to be provided, so flexible options for extra arguments would be ideal. This use case also does not require running the continuous integration parts again, and can go straight to delivery.

The time consumed by continuous delivery processes requires sensible time limits (it is reasonable for a CI process to expect to finish within an hour, but it might not be reasonable to expect a deploy process to always terminate within an hour from workflow start–especially if queuing is involved). Having to queue without doing useful work could also impact resource management: for CI systems that allocate resources to a particular job when the job begins, does it still make sense to hold on to these resources while waiting in a queue or is it possible to release those resources and reacquire them once the job resumes?

The difference in behavior under frequent changes (batching of CD, possible cancelling of jobs at CI stage but not CD stage) requires the system to recognize when a boundary between CI and CD parts of a workflow is crossed, and to have good user experience around the configured behavior (for example, it should be easy to see that a given CI run did not have a CD run because it was batched with other changes, and it should be easy to jump from the CI run to the corresponding CD run).

Final words

For infrequently touched source code with simple deployment processes and a single deployment target, the issues listed above are unlikely to occur in practice. Many currently available CI services may already be used to perform both integration and deployment tasks for such repositories, streamlining the workflow from source to running code.

Currently, a single-step deployment process within a CI pipeline is a recurring pattern. From a CI perspective, a single invocation of a deployment tool or a single API call are made. While in the latter case the integration between CI and CD is nominal at best, the former case does provide benefits (there’s no need to run a separate CD system watching commits or waiting for a trigger to start the deploy, which only purpose is to run the same deployment tool). I found multiple examples of this workflow, yet I doubt that it scales up and handles all of the potential issues identified above, even if the deployment tool already provides a locking functionality.

There are several workarounds and tools that help fit a more complex workflow into a typical CI pipeline, but none of the CI systems we currently use have a complete answer to all of the open questions. For complex deployment requirements and multiple environments, separate CI and CD systems could be the optimal choice for now. Both CI and CD vendors could provide easy ways to integrate between their solution and most popular complementary systems–this would stave off potential competitors that will offer an all-in-one CI and CD solution.

For vendors who want to be an all-in-one solution: it is not easy. There are shared building blocks, but integration and deployment requirements and environments have a lot of differences as well. Forcing a tool designed for one use case to fit the other will cause conceptual strain, misfit, and poor user experience.

The ideal solution would appear to have an architecture not unlike that of Microsoft Office: two tools serving quite different purposes, like Word and Excel, that nevertheless are designed to integrate together seamlessly and leverage shared components in their implementation for efficiency. I can’t wait to give a tool like this a go, once it shows up!