Our mobile development workflow was grinding to a halt, choked by a single, monolithic staging environment. Every pull request deployed to the same backend, leading to constant dependency conflicts, overwritten feature flags, and a QA process riddled with “works on my machine” dead-ends. The productivity cost was unacceptable. We needed a system that could spin up a complete, isolated, and fully functional backend environment for every single mobile app PR, and tear it down just as easily.
The initial concept was a full embrace of GitOps. A developer opens a pull request in the React Native repository, and automation takes over. This automation needed to provision a unique set of microservices in our Kubernetes cluster, configure them, and—most critically—securely isolate them from all other environments. Developers then needed a simple, immediate way to get the endpoint and status for their specific environment, preferably right on their mobile devices where they do their testing.
Our technology selection process was driven by this need for dynamic, secure, and developer-friendly automation.
- CI/CD:
GitHub Actions
was the obvious choice. Our source code resides there, and its native integration with pull request events provides the perfect trigger for our workflow. - Dynamic Configuration & Service Discovery: We needed a central source of truth for these ephemeral environments.
Consul
was selected for its robust Key/Value (KV) store, which we decided to use as a dynamic registry for every preview environment’s metadata—its unique URL, associated PR number, deployment status, and feature flags. This went beyond simple service discovery; it became the control plane’s database. - Network Security: This was the hardest problem. Simply creating a new Kubernetes namespace for each PR felt too heavy and slow for the rapid churn we expected. More importantly, namespace-based
NetworkPolicy
objects felt insufficient. We needed to guarantee thatpr-123-backend
could never, under any circumstances, communicate withpr-124-backend
.Cilium
was the definitive answer. Its eBPF-based networking and identity-awareCiliumNetworkPolicy
offered the L3/L4 and even L7 granularity required to build a true multi-tenant security model within a single cluster, based on pod labels rather than network subnets. - Developer Interface: A web page URL posted in a PR comment is standard, but we wanted to do better. Our team tests on physical devices. We envisioned a simple
React Native
dashboard app, a “control center” for developers. To avoid building and maintaining a complex backend for this app, we opted for a hybrid approach: the React Native app would primarily be a shell containing a WebView. This WebView would point to a Next.js frontend that displays the environment status. To make this feel instantaneous, we would leverageIncremental Static Regeneration (ISR)
. The status pages would be statically generated for blazing-fast loads but could be revalidated on-demand, pulling the latest status directly from Consul KV. This gave us a real-time feel without the complexity of WebSockets.
The following sections document the implementation of this system, including the scripts, configurations, and the inevitable problems we solved along the way.
The Core GitOps Workflow in GitHub Actions
Everything starts with the pull request. We defined a single workflow that triggers on pull_request
events targeting our main
branch.
# .github/workflows/preview-environment.yml
name: Mobile Preview Environment
on:
pull_request:
types: [opened, synchronize, reopened]
branches:
- 'main'
env:
# Static environment variables
GKE_PROJECT: 'our-gcp-project-id'
GKE_ZONE: 'us-central1-c'
GKE_CLUSTER: 'main-cluster'
CONSUL_HTTP_ADDR: ${{ secrets.CONSUL_ADDR }}
CONSUL_HTTP_TOKEN: ${{ secrets.CONSUL_TOKEN }}
# Dynamic variables defined in jobs
jobs:
# Step 1: Generate a unique name and validate the PR
setup:
runs-on: ubuntu-latest
outputs:
env_name: ${{ steps.vars.outputs.env_name }}
pr_number: ${{ steps.vars.outputs.pr_number }}
steps:
- name: 'Generate Environment Name'
id: vars
run: |
# The environment name must be DNS-1123 compliant.
PR_NUMBER=${{ github.event.pull_request.number }}
ENV_NAME="mobile-pr-${PR_NUMBER}"
echo "ENV_NAME=${ENV_NAME}" >> $GITHUB_ENV
echo "PR_NUMBER=${PR_NUMBER}" >> $GITHUB_ENV
echo "env_name=${ENV_NAME}" >> $GITHUB_OUTPUT
echo "pr_number=${PR_NUMBER}" >> $GITHUB_OUTPUT
# Step 2: Deploy the backend services and network policies
deploy:
needs: setup
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write' # Required for GKE Workload Identity Federation
steps:
- name: 'Checkout Repository'
uses: actions/checkout@v3
- name: 'Authenticate to Google Cloud'
uses: 'google-github-actions/auth@v1'
with:
workload_identity_provider: 'projects/123456789/locations/global/workloadIdentityPools/github-pool/providers/github-provider'
service_account: '[email protected]'
- name: 'Get GKE Credentials'
uses: 'google-github-actions/get-gke-credentials@v1'
with:
cluster_name: ${{ env.GKE_CLUSTER }}
location: ${{ env.GKE_ZONE }}
- name: 'Generate K8s Manifests'
id: generate_manifests
run: |
# This script takes the environment name and outputs templated Kubernetes YAML
# A real-world scenario might use Kustomize or Helm, but for clarity,
# we use a simple envsubst-based approach here.
export PREVIEW_ENV_NAME=${{ needs.setup.outputs.env_name }}
export PR_NUMBER=${{ needs.setup.outputs.pr_number }}
# This script generates a single multi-document YAML file
./scripts/generate-manifests.sh > generated-manifests.yaml
- name: 'Update Consul KV - Deploying'
run: |
# Use the Consul CLI to register the initial state
consul kv put "previews/mobile/pr-${{ needs.setup.outputs.pr_number }}/status" "deploying"
consul kv put "previews/mobile/pr-${{ needs.setup.outputs.pr_number }}/pr_title" "${{ github.event.pull_request.title }}"
consul kv put "previews/mobile/pr-${{ needs.setup.outputs.pr_number }}/sha" "${{ github.sha }}"
- name: 'Apply Kubernetes Manifests'
run: |
kubectl apply -f generated-manifests.yaml
- name: 'Wait for Ingress and Update Consul'
id: wait_for_ingress
run: |
echo "Waiting for Ingress to be provisioned..."
# This is a critical step. The Ingress controller needs time to provision a load balancer and get an IP.
# A robust implementation would have a more intelligent polling mechanism.
for i in {1..30}; do
INGRESS_IP=$(kubectl get ingress -l env=${{ needs.setup.outputs.env_name }} -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')
if [ -n "$INGRESS_IP" ]; then
PREVIEW_URL="http://${{ needs.setup.outputs.env_name }}.previews.ourcompany.com"
echo "Ingress provisioned at IP: $INGRESS_IP"
echo "Preview URL: $PREVIEW_URL"
# Final state update in Consul
consul kv put "previews/mobile/pr-${{ needs.setup.outputs.pr_number }}/status" "ready"
consul kv put "previews/mobile/pr-${{ needs.setup.outputs.pr_number }}/url" "$PREVIEW_URL"
echo "url=${PREVIEW_URL}" >> $GITHUB_OUTPUT
exit 0
fi
echo "Attempt $i: Ingress not ready, waiting 10 seconds..."
sleep 10
done
echo "Error: Timed out waiting for Ingress."
# Update Consul to reflect the failure
consul kv put "previews/mobile/pr-${{ needs.setup.outputs.pr_number }}/status" "failed"
exit 1
# Step 3: Trigger ISR revalidation and comment on the PR
- name: 'Trigger ISR Revalidation'
run: |
# Call a secure API endpoint on our Next.js dashboard to purge the cache for this specific PR page.
# This makes the status update appear instantaneous for the developer.
curl -X POST "${{ secrets.REVALIDATE_URL }}?secret=${{ secrets.REVALIDATE_TOKEN }}&slug=pr-${{ needs.setup.outputs.pr_number }}"
- name: 'Comment on PR'
uses: actions/github-script@v6
with:
script: |
const url = "${{ steps.wait_for_ingress.outputs.url }}";
const deepLink = "devdashboard://preview/pr-${{ needs.setup.outputs.pr_number }}";
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `🚀 Preview environment is ready!\n\n- **Web URL:** ${url}\n- **Open in Dashboard:** [${deepLink}](${deepLink})`
})
Dynamic Manifests and Cilium’s Isolation Model
We don’t commit Kubernetes YAMLs for each PR. They are generated on the fly. The key is to inject the unique environment label (env: mobile-pr-123
) into every resource.
Here is the core generation script (scripts/generate-manifests.sh
):
#!/bin/bash
# A simplified script to demonstrate templating.
# In production, consider using Kustomize or Helm for better management.
if [[ -z "$PREVIEW_ENV_NAME" || -z "$PR_NUMBER" ]]; then
echo "Error: PREVIEW_ENV_NAME and PR_NUMBER must be set."
exit 1
fi
# Template for the backend service deployment
cat <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api-${PREVIEW_ENV_NAME}
labels:
app: backend-api
env: ${PREVIEW_ENV_NAME} # Critical label for policy enforcement
spec:
replicas: 1
selector:
matchLabels:
app: backend-api
env: ${PREVIEW_ENV_NAME}
template:
metadata:
labels:
app: backend-api
env: ${PREVIEW_ENV_NAME}
spec:
containers:
- name: server
image: gcr.io/our-gcp-project-id/backend-api:pr-${PR_NUMBER}
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: backend-api-${PREVIEW_ENV_NAME}
labels:
app: backend-api
env: ${PREVIEW_ENV_NAME}
spec:
selector:
app: backend-api
env: ${PREVIEW_ENV_NAME}
ports:
- protocol: TCP
port: 80
targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-${PREVIEW_ENV_NAME}
labels:
env: ${PREVIEW_ENV_NAME}
annotations:
# Annotation for external-dns to create the DNS record
external-dns.alpha.kubernetes.io/hostname: ${PREVIEW_ENV_NAME}.previews.ourcompany.com.
spec:
rules:
- host: ${PREVIEW_ENV_NAME}.previews.ourcompany.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: backend-api-${PREVIEW_ENV_NAME}
port:
number: 80
EOF
The most important generated resource is the `CiliumNetworkPolicy`. This is what creates the secure sandbox. A common mistake is to only define `ingress` rules. A truly secure policy must start with a default-deny stance for both ingress and egress.
```yaml
# This content is appended by generate-manifests.sh
---
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "cnp-${PREVIEW_ENV_NAME}"
spec:
# The endpointSelector targets all pods within this specific preview environment.
endpointSelector:
matchLabels:
env: ${PREVIEW_ENV_NAME}
# --- INGRESS RULES ---
# By default, all ingress is denied. We only allow specific traffic.
ingress:
- fromEndpoints:
# 1. Allow traffic from the ingress controller so the service is publicly accessible.
# We label our ingress controller pods with 'app: ingress-nginx'.
- matchLabels:
"k8s:app.kubernetes.io/name": "ingress-nginx" # Label may vary based on your ingress controller
# 2. Allow traffic from OTHER pods within the SAME preview environment.
# This is key for multi-service environments.
- matchLabels:
env: ${PREVIEW_ENV_NAME}
# 3. Allow traffic from the kube-dns pods for DNS resolution.
- matchLabels:
"k8s:io.kubernetes.pod.namespace": "kube-system"
"k8s:k8s-app": "kube-dns"
ports:
- port: "8080"
protocol: TCP
- port: "53"
protocol: UDP
# --- EGRESS RULES ---
# By default, all egress is denied. We explicitly whitelist destinations.
egress:
- toEndpoints:
# 1. Allow traffic to OTHER pods within the SAME preview environment.
- matchLabels:
env: ${PREVIEW_ENV_NAME}
# 2. Allow traffic to kube-dns for DNS resolution.
- matchLabels:
"k8s:io.kubernetes.pod.namespace": "kube-system"
"k8s:k8s-app": "kube-dns"
ports:
- port: "53"
protocol: UDP
# 3. Allow traffic to external services if needed, e.g., Google APIs.
# Using toCIDR is more secure than allowing all outbound traffic.
- toCIDR:
- "0.0.0.0/0" # In a real project, this should be a specific, limited IP range.
# For example, an external database or a third-party API.
# A wildcard is used here for demonstration but is a security risk.
This policy ensures that pods for pr-123
can talk to each other and resolve DNS, but they are completely blind to pods for pr-124
. The pitfall here is forgetting DNS. Without the egress rule to kube-dns
, services can’t resolve any hostnames, which is a common and frustrating debugging session.
The Consul-Backed ISR Dashboard
The dashboard is a Next.js application. Its core logic resides in the dynamic route page pages/preview/[slug].js
.
The getStaticPaths
function tells Next.js that we don’t know the paths at build time; they will be generated on-demand.
The getStaticProps
function is where the magic happens. It fetches data for a specific PR from the Consul KV store. The revalidate
property enables ISR, telling Next.js to serve the cached static page but attempt to regenerate it in the background at most once every 5 seconds.
// pages/preview/[slug].js
import { Consul } from 'consul';
import { useRouter } from 'next/router';
// Configure the Consul client. In a real application, this configuration
// would come from environment variables.
const consul = new Consul({
host: process.env.CONSUL_HOST || '127.0.0.1',
port: process.env.CONSUL_PORT || '8500',
promisify: true,
});
export async function getStaticPaths() {
// We don't pre-render any paths at build time.
// They will be generated on the first request.
return {
paths: [],
fallback: 'blocking', // 'blocking' waits for the page to be generated on first visit
};
}
export async function getStaticProps(context) {
const { slug } = context.params; // e.g., "pr-123"
const prNumber = slug.replace('pr-', '');
try {
const basePath = `previews/mobile/pr-${prNumber}`;
// Fetch all keys for this PR in a single transaction-like operation
const results = await consul.kv.get({ key: basePath, recurse: true });
if (!results || results.length === 0) {
return { notFound: true }; // If no data found, return 404
}
// Process the flat key-value array into a structured object
const previewData = results.reduce((acc, item) => {
// item.Key is like 'previews/mobile/pr-123/status'
// We want the last part of the key.
const key = item.Key.split('/').pop();
acc[key] = item.Value;
return acc;
}, {});
return {
props: {
data: previewData,
prNumber: prNumber,
},
// Incremental Static Regeneration:
// Re-generate the page at most once every 5 seconds.
// This ensures the data is fresh without hammering Consul on every request.
revalidate: 5,
};
} catch (error) {
console.error(`Failed to fetch data from Consul for ${slug}:`, error);
// In case of a Consul connection error, we can return an error prop
// and prevent the page from being generated.
return {
props: {
error: 'Failed to connect to the configuration service.',
prNumber: prNumber,
},
revalidate: 1, // Attempt to re-generate quickly on error
};
}
}
// The React component to render the page
export default function PreviewPage({ data, prNumber, error }) {
const router = useRouter();
if (router.isFallback) {
return <div>Loading environment details for PR #{prNumber}...</div>;
}
if (error) {
return <div>Error loading data for PR #{prNumber}: {error}</div>;
}
return (
<div>
<h1>Preview for PR #{prNumber}</h1>
<p><strong>Title:</strong> {data.pr_title || 'N/A'}</p>
<p><strong>Status:</strong> <span className={`status-${data.status}`}>{data.status}</span></p>
<p><strong>URL:</strong> <a href={data.url}>{data.url}</a></p>
<p><strong>Commit SHA:</strong> <code>{data.sha}</code></p>
</div>
);
}
To make the dashboard feel truly real-time, we added an on-demand revalidation API endpoint. This allows our GitHub Action to force a page refresh the moment the deployment is complete.
// pages/api/revalidate.js
export default async function handler(req, res) {
// 1. Check for the secret token to prevent unauthorized access.
if (req.query.secret !== process.env.REVALIDATE_TOKEN) {
return res.status(401).json({ message: 'Invalid token' });
}
// 2. Get the slug from the query parameters.
const slug = req.query.slug;
if (!slug) {
return res.status(400).json({ message: 'Slug is required' });
}
try {
// 3. Trigger the revalidation for the specific page.
// The path must match the page file structure: '/preview/[slug]'
await res.revalidate(`/preview/${slug}`);
console.log(`Revalidation triggered for: /preview/${slug}`);
return res.json({ revalidated: true });
} catch (err) {
// If there was an error, Next.js will continue to show the last successfully generated page.
console.error(`Error revalidating /preview/${slug}:`, err);
return res.status(500).send('Error revalidating');
}
}
The Final Picture
This architecture connects a disparate set of technologies into a cohesive, automated system that directly addresses a critical developer productivity bottleneck.
graph TD subgraph GitHub A[Dev pushes to PR] --> B{GitHub Actions Workflow}; end subgraph "Kubernetes Cluster (GKE)" D[Dynamic Manifests] --> E[kubectl apply]; E --> F[Deployment env=pr-123]; E --> G[Service env=pr-123]; E --> H[Ingress env=pr-123]; E --> I[CiliumNetworkPolicy env=pr-123]; I -- Isolates --> F; I -- Isolates --> G; end subgraph "Control Plane" J[Consul KV]; K[Next.js ISR App]; end subgraph "Developer Experience" L[PR Comment]; M[React Native Dashboard]; end B --> C{Generate Manifests}; C --> D; B --> K1[Update Consul KV: 'deploying']; K1 --> J; H --> H1{Ingress Controller}; H1 --> G; G --> F; B --> K2[Wait for Ingress]; K2 --> K3[Update Consul KV: 'ready', URL]; K3 --> J; K3 --> R[Trigger Revalidation API]; R --> K; K3 --> L; K --> M; M -- Displays Data From --> K; K -- getStaticProps reads from --> J;
The system is functional, but it’s not without its own set of trade-offs and future considerations. The PR teardown process, which removes all Kubernetes resources and Consul keys when a PR is closed or merged, is currently handled by a separate nightly garbage collection job. A webhook-driven, real-time cleanup would be more efficient but adds another layer of complexity and potential failure points. Furthermore, each preview environment consumes non-trivial cluster resources; we are actively exploring lighter-weight virtualization like vcluster to reduce the per-environment overhead. The current reliance on Consul KV as a single point of failure for the dashboard is another area for improvement; a caching layer or a more resilient data store architecture could mitigate this risk.