Automating Canary Deployments of an Angular Application on DigitalOcean Kubernetes with Spinnaker and Playwright


The primary source of deployment anxiety for our front-end platform was the all-or-nothing nature of our release process. Pushing a new version of our Angular application, built with UnoCSS for its utility-first styling, was a high-stakes event. A bug slipping through manual QA could impact 100% of our users, forcing a frantic, disruptive rollback. The core technical pain point was a lack of a reliable, automated mechanism to de-risk releases by exposing a new version to a subset of traffic before a full rollout. This led to a decision to build a robust, zero-downtime canary deployment pipeline.

Our initial concept was to leverage a toolchain that could handle complex deployment strategies natively, manage infrastructure declaratively, and provide bulletproof automated validation. Simple CI tools, while excellent for building and testing, often require extensive and brittle scripting to orchestrate canary releases. We needed an orchestrator.

This is why we selected Spinnaker. Its entire design philosophy revolves around continuous delivery and sophisticated deployment patterns like canaries. It treats cloud resources as first-class citizens, which was critical for our infrastructure on DigitalOcean. For infrastructure, DigitalOcean Kubernetes (DOKS) provided the necessary managed environment, abstracting away the control plane’s complexity while giving us direct access to Kubernetes APIs. The final piece was validation. A canary is useless if you can’t verify its correctness. Playwright was chosen for its robust end-to-end testing capabilities, allowing us to simulate real user flows against the canary instance and make an automated go/no-go decision.

Phase 1: Production-Grade Application Containerization

Before any deployment orchestration, the application itself needs to be correctly containerized. A common mistake is to create bloated Docker images that include build-time dependencies. We employ a multi-stage Dockerfile to ensure our final image is lean, containing only the Nginx server and the compiled static assets from our Angular and UnoCSS build.

# Dockerfile

# ---- Stage 1: Build ----
# Use a specific Node.js version for build reproducibility.
FROM node:18.18.0-alpine AS build

WORKDIR /app

# Copy package files and install dependencies.
# This layer is cached unless package.json or package-lock.json changes.
COPY package.json package-lock.json ./
RUN npm ci --loglevel warn

# Copy the rest of the application source code.
COPY . .

# Run the production build. The Angular CLI handles tree-shaking and optimization.
# UnoCSS is integrated into the build process via its webpack plugin.
# The --configuration=production flag is critical.
RUN npm run build -- --configuration=production

# ---- Stage 2: Serve ----
# Use a minimal, hardened Nginx image.
FROM nginx:1.25.3-alpine

# Remove the default Nginx configuration.
RUN rm /etc/nginx/conf.d/default.conf

# Copy our custom Nginx configuration.
# This config is tailored for a Single Page Application (SPA).
COPY nginx.conf /etc/nginx/conf.d/

# Copy the compiled application artifacts from the 'build' stage.
COPY --from=build /app/dist/my-angular-app /usr/share/nginx/html

# Expose the standard HTTP port.
EXPOSE 80

# The default Nginx command will start the server.
CMD ["nginx", "-g", "daemon off;"]

The accompanying nginx.conf is crucial for an Angular SPA, as it must correctly handle client-side routing by redirecting all 404s to index.html.

# nginx.conf
server {
  listen 80;
  server_name localhost;

  # Root directory for the static files.
  root /usr/share/nginx/html;
  index index.html index.htm;

  # Enable Gzip compression for better performance.
  gzip on;
  gzip_vary on;
  gzip_min_length 1024;
  gzip_proxied expired no-cache no-store private auth;
  gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
  gzip_disable "MSIE [1-6]\.";

  location / {
    # This is the key for SPA routing.
    # If a file is not found, fall back to index.html.
    try_files $uri $uri/ /index.html;
  }

  # Add long-lived cache headers for static assets.
  # Angular CLI adds hashes to filenames, so we can cache them aggressively.
  location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg)$ {
    expires 1y;
    add_header Cache-Control "public";
  }
}

With this setup, we can build and push a versioned, production-ready image to our DigitalOcean Container Registry:

# Authenticate with DigitalOcean Container Registry
doctl registry login

# Build and tag the image
VERSION=$(node -p "require('./package.json').version")
IMAGE_NAME="registry.digitalocean.com/my-registry/my-angular-app:${VERSION}"
docker build -t ${IMAGE_NAME} .

# Push the image
docker push ${IMAGE_NAME}

Phase 2: Kubernetes Manifests for Canary Traffic Management

The core of our canary strategy lies in how we structure our Kubernetes deployments. Instead of a single deployment, we manage two: a baseline (the stable production version) and a canary (the new, unverified version). A single Kubernetes Service will select pods from both deployments, allowing the standard kube-proxy load balancing to split traffic based on the number of running pods in each deployment.

Here is the manifest for the baseline deployment. Spinnaker will manage the image tag in this file.

# k8s/deployment-baseline.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: angular-app-baseline
  namespace: production
  labels:
    app: angular-app
    track: stable
spec:
  replicas: 4 # Maintain a healthy number of baseline pods
  selector:
    matchLabels:
      app: angular-app
  template:
    metadata:
      labels:
        app: angular-app
        # This 'track' label is important for the service selector
        track: stable
    spec:
      containers:
      - name: angular-app
        # Spinnaker will replace this with the specific stable version
        image: registry.digitalocean.com/my-registry/my-angular-app:placeholder
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "250m"
            memory: "256Mi"

The canary deployment is almost identical but has a different name and track label, and it’s initially deployed with zero replicas.

# k8s/deployment-canary.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: angular-app-canary
  namespace: production
  labels:
    app: angular-app
    track: canary
spec:
  replicas: 0 # Canary starts with no pods
  selector:
    matchLabels:
      app: angular-app
  template:
    metadata:
      labels:
        app: angular-app
        track: canary
    spec:
      containers:
      - name: angular-app
        # Spinnaker will inject the new version to test here
        image: registry.digitalocean.com/my-registry/my-angular-app:placeholder
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "250m"
            memory: "256Mi"

The Service ties them together. The pitfall here is using a selector that is too specific. By only selecting on app: angular-app, it will route traffic to any pod with that label, regardless of whether it’s stable or canary.

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: angular-app-svc
  namespace: production
spec:
  type: LoadBalancer # Exposes the service via a DigitalOcean Load Balancer
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
  selector:
    # This selector is key: it targets pods from BOTH deployments.
    app: angular-app

Phase 3: The Spinnaker Pipeline Orchestration

With the building blocks in place, we can construct the Spinnaker pipeline. This is defined as a JSON or YAML file. Spinnaker’s UI is useful for building it, but in a real-world project, the pipeline definition should be version-controlled.

The pipeline flow is visualized below:

graph TD
    A[Trigger: New Image in Registry] --> B{Find Baseline Image};
    B --> C[Deploy Canary];
    C --> D[Run Playwright E2E Tests];
    D --> E{Tests Passed?};
    E -- Yes --> F[Promote to Production];
    F --> G[Cleanup Canary];
    E -- No --> H[Rollback: Destroy Canary];

Here’s a breakdown of the key stages:

1. Configuration: Trigger and Parameters

The pipeline triggers automatically when a new image tag appears in our DigitalOcean Container Registry. Spinnaker uses this trigger to extract the image digest and tag, which are then available throughout the pipeline execution.

2. Stage: Deploy Canary

This stage uses Spinnaker’s “Deploy (Manifest)” step. It takes the k8s/deployment-canary.yaml text as input.

  • Action: It applies the canary deployment manifest to our DOKS cluster.
  • Overrides: Critically, it overrides two values:
    1. spec.replicas: Set to 1. This spins up a single canary pod. If the baseline has 4 replicas, this directs roughly 20% of traffic to the new version.
    2. spec.template.spec.containers[0].image: Set to the image reference from the pipeline trigger. This ensures we are deploying the new version.

3. Stage: Run Playwright E2E Validation

This is the most complex integration. Spinnaker does not run tests directly. Instead, we use a “Run Job (Manifest)” stage to launch a Kubernetes Job that executes our Playwright test suite.

Here is the manifest for the validation Job:

# k8s/playwright-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: playwright-validation-run-${#uuid()} # Unique name for each run
  namespace: testing
spec:
  template:
    spec:
      containers:
      - name: playwright-runner
        image: mcr.microsoft.com/playwright:v1.39.0 # Official Playwright image
        command: ["/bin/sh", "-c"]
        args:
        - |
          # 1. Clone the test repository
          git clone https://<PAT>@github.com/my-org/my-e2e-tests.git
          cd my-e2e-tests

          # 2. Install dependencies
          npm ci

          # 3. Run the tests against the production service endpoint.
          # Because the service load balances, some tests will hit the canary.
          # The exit code determines the success or failure of the Spinnaker stage.
          npx playwright test --reporter=line
        env:
        - name: PLAYWRIGHT_BASE_URL
          # The service endpoint in the 'production' namespace
          value: "http://angular-app-svc.production.svc.cluster.local"
      restartPolicy: Never
  backoffLimit: 0 # Do not retry the job on failure

The Playwright test itself needs to be robust. It shouldn’t just check for the existence of an element. It should verify a critical user journey. For example, a test for a checkout flow.

// tests/checkout.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Critical Checkout Flow', () => {
  test('should allow a user to add an item to the cart and proceed to checkout', async ({ page }) => {
    // This assumes the baseURL is set in the config to our service endpoint
    await page.goto('/');

    // 1. Verify a key element rendered by the new UnoCSS styles
    const heroButton = page.locator('button.bg-primary-500');
    await expect(heroButton).toBeVisible({ timeout: 10000 });
    await expect(heroButton).toHaveText('Shop Now');

    // 2. Navigate and perform an action
    await page.locator('.product-card[data-product-id="abc-123"]').click();
    await page.getByRole('button', { name: 'Add to Cart' }).click();

    // 3. Verify state change
    await expect(page.locator('.cart-item-count')).toHaveText('1');

    // 4. Intercept a critical API call to ensure the payload is correct
    // This guards against front-end/back-end contract regressions.
    let checkoutApiRequest: any = null;
    page.on('request', request => {
      if (request.url().includes('/api/v2/checkout')) {
        checkoutApiRequest = request.postDataJSON();
      }
    });

    await page.getByRole('link', { name: 'Checkout' }).click();
    await page.getByRole('button', { name: 'Confirm Purchase' }).click();

    // 5. Assert on the intercepted network call
    await expect(page.locator('.order-confirmation')).toBeVisible();
    expect(checkoutApiRequest).not.toBeNull();
    expect(checkoutApiRequest.items[0].id).toBe('abc-123');
    expect(checkoutApiRequest.metadata.clientVersion).toBeDefined(); // Check for new metadata
  });
});

The key is that the Kubernetes job’s exit code (0 for success, 1 for failure) is automatically propagated back to Spinnaker, which then decides whether to proceed or halt the pipeline.

4. Stage: Promote to Production

If the Playwright job succeeds, the pipeline continues to the promotion stage. This is another “Deploy (Manifest)” stage, but this time it modifies the angular-app-baseline deployment.

  • Action: It applies the k8s/deployment-baseline.yaml manifest.
  • Overrides: It only overrides one value:
    1. spec.template.spec.containers[0].image: Set to the new image from the pipeline trigger.

Kubernetes’ rolling update strategy will then safely replace the old baseline pods with the new version, one by one, ensuring no downtime.

5. Stage: Cleanup Canary

Once the baseline is fully updated, the canary is no longer needed. A final “Scale (Manifest)” or “Delete (Manifest)” stage is executed. We prefer scaling the canary deployment down to zero replicas.

  • Target: deployment angular-app-canary in the production namespace.
  • Action: Scale to 0 replicas.

This preserves the canary deployment object for potential reuse but removes its pods, stopping it from receiving any traffic and consuming resources. If the pipeline had failed at the testing stage, the failure path would have triggered this same cleanup stage, effectively rolling back the change by simply removing the canary.

The resulting system provides a high degree of confidence. A developer commit triggers a process that results in a new version being tested against a live, small percentage of production traffic with a comprehensive E2E suite. Only upon successful validation is the change promoted. The entire flow is hands-off, reducing both the risk and the toil associated with releases.

This pipeline architecture, while effective, is not the final evolution. The current validation is a binary pass/fail from Playwright. A more sophisticated approach would integrate Spinnaker’s automated canary analysis engine, Kayenta, to monitor Prometheus metrics from both the baseline and canary pods over a period of time, making a data-driven decision based on error rates, latency percentiles, and business metrics. Furthermore, traffic splitting is currently coarse-grained and dependent on pod ratios; for fine-grained control (e.g., 1%, 5%, 10% traffic), integrating a service mesh like Istio or Linkerd would be the next logical step, allowing Spinnaker to manipulate traffic routing rules directly. Finally, this workflow does not address database schema migrations, which must be carefully managed as a separate, but coordinated, process to ensure backward compatibility during the canary period.


  TOC